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ABSTRACT 


This report summarizes and attempts to correlate and 
evaluate quantitative tests, reported in English since 
about 1920, used to measure behavior—decrement under the 
following principal conditions: altitude, vibration, 
noise, temperature, humidity, "fatigue", apprehension, 
stress, and others. The bibliography exceeds 500 refer- 
ences. In Section I tests are described, skeleton data 
tabulated, and analysis-variables discussed. Results 
obtained under altitude, noise, vibration, and tempera- 
ture are summarized. There is apparently no single 
index of general ‘psychomotor performance’. Section II 
reviews studies on '*fatigue', loss of sleep, apprehen- 
sion and stress, with concluding emvhasis upon configu- 
ration of the complex reaction pattern and motivation. 
Promising scoring indices are (1) ratio of errors and 
duration to number of movements; (2) instances of 
omissions of parts of complex tasks, reflecting ‘lowered 
standard’; (3) variability of response; (4) occurrence 
of *blocking'; disruptions in (5) timing, and (6) the 
configurational pattern. Additional test-categories, 
considered beyond the scope of the report, are included 
in the Appendix. 
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FOREWORD 


This report was prepared by Oberlin College, 
Oberlin, Ohio, under USAF Contract No. W33-038 
ac-19047 (18876). The contract was initiated by 
Dr. J. W. Heim, MCREXDS, under the research and 
development project, initiated by Expenditure 
Order No. 696-61, and it was administered under 
the direction of the Aero Medical Laboratory, 
Headquarters, Air Materiel Command, with 
Dr. Louis D. Hartson and Dr. John L. Finan as 
supervisors. The accumulation of bibliography 
and compilation of Section I is chiefly the work 
of Mrs. Sarah C. Finan, the writing of the 
Introduction and Section I, including the evalua- 
tion of performance tests, of Dr. Finan. Dr. Hartson 
has been largely responsible for Section II, 


ACKNOWLEDGEMENTS 


The authors wish to acknowledge the kindness 
of Dr. Ross McFarland for making available his own 
work as well as his personal library on altitude 
studies. Opportunity is also taken to thank 
Dr. S. S. Stevens for his kindly cooperation in 
extending the facilities of the Harvard Psycho- 
acoustical Laboratory to two of the writers (J.L.F. 
and S.C.F.). Dr. F. C. Bartlett has also been 
especially helpful by sending wnpublished reports, 
as well as reprints, of the Cambridge University 
studies. 


USAF-TR-5830 ii 


446305 


TABLE OF CONTENTS 


INTRODUCTION 
Nature and Purpose of the report 
Scope of the study 
Technique of the study 


SECTION I -— DESCRIPTION AND TABULATION OF PERFORMANCE 


TESTS 


Plan of discussion 
The problem of classifying the tests 
Groups of tests 


l. 
26 
3. 
4, 
5. 


Tests of simple reaction time 
Tapping tests 
Tests of static steadiness ~ Arm-—Hand 
Tests of static steadiness -— Body Sway 
Steadiness aiming and 
Tests of dynamic equilibrium 
Aiming, spearing, and allied tests 
Tests of manipulation and dexterity 
Path-tracing tests 
Dotting tests 
Pursuit tests 
Discrimination reaction-time tests 
Naming tests 
Card-sorting tests 
Cancellation tests 
Substitution tests (Code) 
Computation tests 
Tests of perceptual judgment 
Miscellaneous tests of visual perception 
Tests of visual perception span 
Tests of fixation (Immediate Memory) 
Tests of memory and learning 
Tests of associative relations and 
reasoning 
Tests of perseveration (Change of Set) 
Miscellaneous performance tests 
Complex tests simulating some aspect of 
flight performance 


Summary of results and general conclusions 
Summary of results and conclusions on 


conditions of altitude, noise, vibration, 


temperature and humidity 
Other conclusions 


USAF-TR-5830 iii 


Pege No. 


On ele oll oe 


131 
141 


Table of Contents (con.) 


Page No. 
SECTION II - STUDIES OF FATIGUE, LOSS OF SLEEP, 
APPREHENSION AND STRESS 144 
Introduction 144 
Studies with tests administered after subjects 
have engaged in activity 144 
1. After sleep deprivation 144 
2. After hours of driving 146 
3. After repetitive work 147 
Studies of deterioration during the progress 
of the task itself 147 


Studies of stimuli introduced to heighten stress 149 
Studies of deterioration in the performance of 


tasks simulating flight 151 
Studies of operational fatigue 154 
Summary and interpretation of the studies of 
fatigue, apprehension and stress 156 
APPENDIX 160 
A-1 Tests of visual function 160 
A-2 Critical flicker frequency tests 160 
A-3 Tests of auditory function 164 
A-4 Tests of other sensory functions 164 
A-5 Measures of physiological correlates 164 
A-6 Tests of eye-movement and frequency of 170 
blinking 
A-7 Strength of grip tests 170 
A-8 Tests of general intelligence 174 
BIBLIOGRAPHY 185 
List of Journals searched systematically 186 
Report Series 187 
General sources 188 
Bibliographical list 189 


USAF-TR-5830 iv 


A REVIEW OF REPRESENTATIVE TESTS USED FOR THE 
QUANTITATIVE MEASUREMENTS OF BEHAVIOR-DECREMENT 
UNDER CONDITIONS RELATED TO AIRCRAFT FLIGHT, 


INTRODUCTION 


Nature and Purpose of the Report. 


This report attempts to summarize, correlate and evaluate 
quantitative tests which have been used to measure psychological 
performance under a variety of conditions similar to those en- 
countered in flying. Tests included are those which are repre- 
sentative of efforts to quantify performance under the following 
conditions: altitude, vibration, noise, temperature, humidity, 
fatigue, apprehension, stress, drugs, dietary factors and others. 


Scope of the Study 


The extremely large volume of literature found in this. field 
necessitated a delimitation of the materials to be included. 


No attempt has been made to include tests designed primarily 
for selection and classification of individuals. Testing instru- 
ments devised primarily for this purpose, while by no means without 
bearing on the problem of performance, are usually constructed on 
different principles. For the task in hand it has been assumed 
that the performance test differs at least in emphasis from the 
predictive index in that its primary goal is the isolation of 
basic functions of behavior. Partly for this reason the perfor- 
mance test does not, as a rule, have its rationale in a job analysis 
of some complex task, and is not likely to be systematically inter- 
pretable if it is no more than a ‘miniature situation’ or 'work- 
sample’ whose justification is that it correlates with a prescribed 
‘criterion’. Instead, the validity of a test of performance is 
established by demonstrating its covariation with the environmental 
condition under investigation. Even though the same test is 
occasionally used for both purposes, important differences between 
the two kinds of instruments are obscured unless this distinction 
is borne in mind. 


Tests which were regarded as primarily physi oioel cal* were 
excluded from the report. Here again the distinction is not an 
absolute one, since the areas of psychology and physiology are 
overlapping to a considerable degree. When a physiological 
process is of interest mainly as an index of impairment*® of more 


Tae table A-5 and A-7 in appendix. 
2806 tables A-2 and A-6 in appendix. 
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complex behavior functions, it has been regarded as of secondary 
importance and included in the appendix. Critical Flicker 
Frequency is an example in point since workers in this field of 
research have exploited it, in the main, not merely as a test of 
visual function, but as a quantitative measure of 'fetigue’. 
Simple sensory functions represent another category of behavior 
excluded from detailed consideration. A sample of sensory tests 
has been appended largely on the ground that psychologists are 
sometimes forced to consider sensory acuity in order to eliminate 
impairment in this factor, which is a necessary condition of almost 
all behavior. 


At the other extreme of behavioral complexity, intelligence 
and personality tests have been included only to a minimum extent. 
This exclusion is based in part on their concern with problems of 
predicting individual differences. Where, however, types of items 
have been broken out of intelligence tests for separate study as 
possible unitary factors, they have, in representative cases, been 
included in the report. 


Clinical tests, while they yield many leads as to behavioral 
processes affected by impairment of the nervous system, or of 
other integrative mechanisms of the organism, have been omitted as 
a group because of their non-quantitetive character. Tests of 
‘abstraction’, ‘categorization’ end similar tests primarily validsted 
against clinical evidence, might profitably be made the subject of 
a separate report. 


The literature emerging from World War I has been covered only 
sporadically on the ground that ‘promising’ tests developed during 
this early period have since been exploited and results incorporated 
into the more recent literature. 


Within the field as thus restricted, the literature in Rnglish® 
has been systematically covered through December, 1948. The range 
of information embraced will be seen to be extremely broad, but 
perhaps no more so than a preliminary attack on this field justifies. 


Technicue of the Study 


Abstracts from the relevant articles included data on the 
nature of subjects, conditions, tests employed and methods of 
analvsis, together with special features of a particular study. 


oe 


Sac tables A~1, A-3 and A-4 in appendix. 
= See tables A-8 and A-9 in appendix. 


er few studies in French and German are included. References searched 
in foreign languages yielded only a small number of articles, which 
were, for the most part, non-quantitative. 
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Articles were searched for information according to a set of 
general categories adopted as a result of reading a preliminary 
sample of materials to be covered: 


1. Subject-variables included, in addition to number of 
individuals studied, such background factors as age, sex, educa- 
tional status, occupational status, previous experience related 
to test, and the like. Whenever the information was available, 
note was made of the manner in which the population studied was 
selected, and of possible controls exercised on the subjects’ 
living regime during the course of an extended experiment. 


2. Condition-variables included not merely a statement of 
the gross stimulating circumstance, but how it was obtained, in 
what degree it was present, its duration, and the like. In 
studies of altitude, for example, it is important to know, in 
addition to height or partial pressure of oxygen, duration of 
ascent, length of stay at altitude, and method of producing the 
condition, whether by means of a decompression chamber, a re- 
breathing apparatus or actual climbing or flying. Similarly, 
in alcohol studies, as noted by Jellinek and McFarland (1940) 
account must be taken of such factors as "modus of alcohol 
administration (standard dosages, dosages per kilogram of body 
weight, oral or intravenous administration, amount, dilution, 
disguise of drink, rate of drinking, etc.), the time of alcohol 
administration in relation to food intake, rest... the time 
between alcohol ingestion and test observation....and many other 
factors". Other conditions such as temperature, noise, vibration, 
required that corresponding information be noted, when available. 


5. Test—variables were of primary significance and embraced 
such factors as the design of the experiment, additional tests, 
if the particular test was one unit in a battery, its position 
in the battery, instructions given to the subject, apparatus 
employed, characteristics of response, and methods of recording. 
In addition, practice effects, length of test, motivation of 
subject, and the like, were noted. 


4. Analysis-variables included among other things the index 
of scoring, the validity, sensitivity and reliability of the test. 
The index of scoring selected for measurement may often differ 
within a single test. In tests of complex-reaction time, for 
example, it is likely to be crucial whether a decrement is 
expressed in terms of time or errors. Validity in performance 
tests, as noted above, refers to the correlation between behavior 
and the experimental condition. A test for the effects of altitude 
is thus valid if it can be demonstrated that the single variable 
or isolated complex of variables which is identifiable yields a 
given degree of performance at one height, and another, at a 
different height. The greater the range of the condition over 
which such correspondence obtains, other factors equal, the greater 
the sensitivity of the test. An adequate evaluation of reliability 
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rests on a number of factors in addition to a correlational value 
expressing the consistency of the test as a measure of performance. 
Information should be gathered concerning the probable error of the 
correlation, its derivation from intra-test or inter-test data, 
number of subjects on which a reliability coefficient is based, trial 
sequences correlated, number of trials employed in the calculation, 
and other factors. Since relative susceptibility or insusceptibility 
of a test to practice bears importantly on its reliability, this 
feature should be noted. Whether the reliabilities were actually 
estimated under a particular condition, or independently, was regarded 
as important. Finally, interpretation of the data was given special 
attention either with resvect to the investigator's hunch as to the 
function being measured when offered, or, more importantly, with 
respect to intercorrelations with other tests. 


In practice, few articles were found in which all the information 
demanded by these desiderata was given. Comparison between tests is 
probably unjustified without equation of all of these factors, yet 
there appears to be some point to making a preliminary assessment of 
- the accumulated evidence, on a Droad basis. On this assumption, tests 
and conditions were therefore grouped together and compared, in most 
cases, without complete regard for these requirements. Both the mass 
of evidence and the inadequacy of experimental accounts frequently 
preclude the possibility of rendering experimental data into common 
terms. Results have therefore usually been reported as positive or 
negative, depending on the stated conclusions of the experimenter. 
Refinement of analysis beyond this point, while highly desirable, 
would be incompatible with an effort as comprehensive as the present 
review. It is significant that consistency of results, both of tests 
and conditions, appears despite the inadequate founding of many of 
the studies. The discussion which follows is largely based on a 
series of tables summarizing results by type of test, and later by 
conditions. The classification of the data required by tabular 
presentation should not be construed as having any purpose beyond that 
of an expository device. An attempt has been made to include within 
the tables features of the data which are important to their evaluation. 


SECTION I 
DESCRIPTION AND TABULATION OF PERFORMANCE TESTS 
Plan of discussion 


The plan of discussion for each group of test results usually 
includes a brief consideration of some of the major factors requiring 
control. Next, brief descriptions of the essential features of the 
procedure, the apparatus, task required of subject, and the like, 
are given for those tests which are deemed most representative 
of the group. Following this, findings obtained under the several 
conditions are considered. The distribution characteristics for 
one or more representative tests are next given. Finally, inter- 
correlational, factor analytic, and other interpretive data are 
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brought to bear on the problem of interpreting the nature of the 
psychological functions measured by the test. 


The Problem of Classifying the Tests 


The status of information in the field of performance testing 
does not, at present, suggest a consistent classification in terms 
of basic psychological components. More or less arbitrary classi- 
fications have been employed in the past by Whipple (1914), Muscio 
(1922), Garrett and Schneck (1933), and more recently by Melton 
(1947) and by Guilford (1947). In the discussion to follow, a 
number of possible organizing principles have been relied on, but 
without any attempt at a rigorously systematic classification of 
tests. Classified according to the kind of stimulus presentation, 
tests range from those demanding simple sensory functions, to those 
which depend on more complex principles or symbolic factors. In 
the latter, ‘interpretability’ of the presentation is stressed, 
with a minimization of ‘acuity’ and 'discrimination' factors which 
are emphasized in the former. Classified in terms of response, 
tests vary from those which stress relatively simple motor functions, 
to others in which the reaction is of a highly complex verbal or 
*ideational' type. Further significant distinctions in response 
based on types of movement, bodily members involved, discrete or 
non-discrete character, will be developed as needed in the ensuing 
discussion. Non-discrete movements are divided into those which 
are repetitive, serial and continuous (Fitts 1947) when appropriate 
to differentiation of the tests. 


Learning, which provides a further basis of test classification, 
may, for purposes of this report, be regarded as a long-sectional 
dimension of behavior that has relevance for any performance in so 
far as it may be modified by practice. Motivation is similarly a 
category applicable to performance in general. It is apparent that 
any measure of performance makes demands on the sensory, motor and 
coordinating capacities and reflects the learning and motivation of 
the individual under test. At most, therefore, any one of these 
factors may be examined while the remaining ones are minimized or 
held constant. Tests are usually identified by little more than 
common features of the test situation, and distinctions observed in 
the discussion to follow, hew closely to differences in testing 
operations and procedures. The order of presentation of the tests 
will be seen to move roughly from the simple to more complex psychologi- 
cal functions, although inversions resulting from the many types of 
possible variation within the same testing situation, can be observed. 


Groups of Tests 
1. Tests of Simple Reaction Time 


Tests in this group have as a common characteristic the 
requirement that the subject respond as rapidly as possible to the 
discrete presentation of a single stimulus. In the typical simple 
visual reaction time situation the subject responds to the onset of 
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light by depressing a key. The character of the stimulation is not 
limited to visual cues, however, and is frequently based on auditory, 
tactile, or on other sensory fields. Responses chosen for measure- 
ment include, in addition to movement of the hand, that of the eye, 
mouth (word-reaction), foot or toe, or of the body as a whole. 
Normative data comparing response latencies for several sensory 
modalities are given by Forbes (1945), and for response-members by 
Seashore and Seashore (1941). While the movements involved are 
relatively simple, they differ from test to test end may, for example, 
involve lifting the finger from a key already depressed, or moving the 
hand from a designated place on the apparatus to a key. Latency of 
response is standardly measured with a chronoscopic device. A variety 
of methods employed in the measurement of simple reaction time are 
described by Miles (1931). 


Consideration of the accompanying Table 1, Part I, which 
summarizes the influence of the selected environmental conditions on 
simple reaction-time tests of various kinds, shows that increased re- 
action latencies have been obtained with extended periods spent at 
high altitudes (17,000 - 20,000 feet) (McFarland 1932, 1937), although 
Wespi (1933, 1936) failed to note decrement at approximately 16,500 
feet in a U-chamber. Five studies of sleep privation (Patrick and 
Gilbert 1896; Lee and Kleitman 1923; Cooperman, Mullin and Kleitman 
1934; Tyler 1947; Edwards 1941) are consistent in showing no decrement 
in simple reaction time. A study by Jones et al (1941) demonstrates 
an increase in response-letency in truck drivers as a function of hours 
of driving. Alcohol (Varé 1932) and morphine (Macht and Isaacs 1917) 
depress the function. Vibration yielded no effect (Coerman 1939). 
Additional results obtained under various conditions may be noted 
in Table 1. 


In Part II of Table 1 are presented results obtained with 
variant procedures involving responses other than manual. Alcohol 
(Miles 1924) depresses reactions of the present type as do lack of 
Vitamin-B during the period of acute deficiency (Brozek et al 1946), 
end *flagging of attention’ (Travis and Kennedy 1947). With carbon 
monoxide (Forbes, Dill et al 1937), except in concentrations of 30% 
or higher, no effects were noted. 


Reliability of a simple reaction-time test was computed as .72 
by Farnsworth, Seashore and Tinker (1927) in a standardization study. 
Seashore and Seashore (1941) in an independent study give reliability 
coefficients of .87 - .90 (uncorrected) for simple hand and foot 
responses to an auditory stimulus. The generally accepted observa- 
tion that reaction time is relatively slightly influenced by practice 
may suggest one factor contributing to its relatively high reliability. 
Distraction, however, is reported to be capable of producing a marked 
diminution in reliability of simple reaction-time, at least under 
some conditions. 
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Simple reaction time scores have come to be accepted by many 
workers as measures of habitual tasks. According to Seashore, Buxton 
and McCollom (1940) a group factor can be analyzed from a variety of 
reaction time tests. Intercorrelational data shed further light on 
the nature of the function measured by the present type of test: 
Forbes (1945) gives the correlation between response to light and 
sound as .48. Results showing relatively high intercorrelations of 
various types of simple reaction tests are reported by Farnsworth, 
Seashore and Tinker (1927). The same investigators report high 
correlations between right and left hands, as well as between forms 
of the test in which the subjects are prepared and unprepared. 
Results agree that simple reaction-time shows no relation to general 
intelligence (Sisk 1926; Farnsworth, Seashore and Tinker 1927). In 
the same study Sisk (1926) also shows absence of correlation with 
such complex tests as substitution or card-sorting. According to 
data obtained by Seashore, Buxton and McCollom (1940), the auditory 
reaction has little or no correlation with single- or double-plate 
tapping; the visual form, however, showed a low degree of relation- 
ship with single-plate tapping. Further evidence from the same 
experiments showed no correlation of simple reaction time with a 
discrimination reaction time test and with a pursuit rotor test. A 
part of the importance of tests of this type stems from the widely 
held, but unsubstantiated, hypothesis that complex tests of a sensori- 
motor type are built up from simple reaction units. 


2. Tapping Tests 


Tests falling under the category of tapping appear to'differ 
from those of simple reaction time significantly in the respect that 
responses must be repeated in immediate succession. As an implication 
of this difference, it is reasonable to suppose that each successive 
response in the case of tapping is dependent on stimulus cues produced 
by the preceding response. Both appear to be primarily tests of speed 
with minimisation of a precision factor. The variety of methods 
employed in the literature varies from simple finger oscillation to 
complex simultaneous movements of the two hands. This diversity of 
techniques makes it difficult to generalize about tests of tapping. 
Representative tests will be found in the following sources: one-plate, 
with stylus technique, Whipple (1914); counter or telegraph-key, 
Garrett and Schneck (1933); two-plate, with stylus, Dunlap (1921), 
Garrett and Schneck (1933); two-plate with finger, Finan and Malmo 
(1944); two telegraph keys, Stevens (1941); two-plate with both hands, 
Birren and Fisher (1945). Revisions in the design of apparatus and 
technique for measurement of tapping represent efforts to control such 
variable factors as ‘tremor tapping’ (minimized by requiring alternate 
response on each of two plates), differing leverages on stylus 
(minimized by responding directly with the finger), and sliding from 
plate to plate (minimized by separating the plates by means of a rais: 
barrier). In the standard version of the tapping test the subject is 
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required to respond with a stylus held in the preferred hand by 
alternately tapping on two plates separated by a short distance. 
Number of taps per unit time is recorded on an automatic counter 
which is actuated by contact of the stylus with the plates. 


Results obtained with various types of tapping tests are 
arrayed in Table 2. Under actual or simulated altitude, deficit is 
reported by three investigators (Bagby 1921; Lowson 1923; Malmo and 
Finan 1944), but only in advanced stages or under rather extreme 
conditions, Seven studies of sleep privation agree in reporting 
absence of effects on tapping performance (Patrick and Gilbert 1896; 
Robinson and Herrmann 1922; Husband 1935; Katz and Landis 1935; 
Warren and Clark 1937; Edwards 1941; Tyler 1947). Alcohol is reported 
to have a depressing effect on speed of tapping (Hollingworth 1923-24; 
Miles 1924), while benzedrine has been found to facilitate it slightly, 
by several investigators (Carl and Turner 1959 and 1940; Thornton, 
Holck and Smith 1939; Simonson and Enzer 1941). Flory and Gilbert 
(1943), while obtaining findings with benzedrine similar to those 
reported above, point to suggestion as a possible factor in the deter- 
mination of the result. Results with caffeine parallel those 
considered immediately above, showing a slight stimulating effect on 
the present type of performance (Hollingworth 1912; Flory and Gilbert 
1943). Other data summarized in Table 2 indicate performance deficits 
under dietary deficiencies (Taylor et al 1945; Glickman et al 1946) 
and cold (Keeton et al 1946; Mitchell et al 1946), and absence of clear 
cut impairment with smoking (Hull 1924), carbon monoxide (Dorcus and 
Weigand 1929), adrenaline (Jersild and Thomas 1931) and noise (Stevens 
1941). 


Reliability coefficients are given by Muscio (1922-23) for the 
one-plate form as .86 (trials 1 - 4) and .92 (trials 21-24). Malmo 
and Finan (1944), using a two-plate, non-stylus apparatus, calculate 
an intratest coefficient of .94 (corrected .97) under test-retest 
conditions, however the reliability depreciated to .37, for trials 
1 and 2, rising to .76 for trials 2 and 3. Melton (1947) has recently 
reported for a two-plate stylus technique, an intratest value of .96, 
based on the first three minutes of tapping, and a coefficient of 
.94, based on the last three minutes of an eight minute period. 
According to Muscio (1922-23) the reiiability of the test increases 
once the limit of improvement is reached. In general, investigators 
are agreed that during a brief initial period the tapping test is 
highly susceptible to practice. 


Intercorrelations between single performances on two- and three- 
plate tests have been demonstrated by Seashore, Buxton and McCollom 
(1940) to yield an interpretable pattern, with the least inter- 
relationship between the one- and the three-plate techniques. In the 
same study vertical telegraph key tapping correlated considerably 
higher with horizontal telegraph key tapping than with the one-plate 
tapping, diminishing with the two- and still further with the three- 
plate apparatus. Moderate correlation values are reported by 
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Melton (1947) between tapping and discrimination reaction, two 

hand coordination, and the SAM complex coordinator. As noted 
previously, it has been shown by numerous experiments that little 
relationship obtains between tapping and simple reaction time. A 
low correlation has been found between tapping and body sway 
(Seashore, Buxton and McCollom 1940). A factor analytic study by 
Melton (1947) indicates that tapping performance has a unique 
loading interpreted as a ‘wrist’ factor. Seashore (1940) reports 
two factors involved in tapping: one for movement within a single 
plane measured by the telegraph key or one-plate apparatus, another 
sampled by the multiple-plate technique in which at least a minimum 
of precision is a requirement. A further finding of little or no 
communality between tapping and tremor suggests that the two types 
of response may be differentiated as voluntary and involuntary. 


3. Tests of Static Steadiness 


On the level of testing operations, steadiness differs from 
the sensori-motor tests considered up to this point significantly in 
its emphasis on precision rather than speed. Tests of the present 
type are considered to measure control of fixed movement, or amount 
of uncontrolled movement which occurs when the hand, arm, finger, 
head, or othsr reaction systems, including the entire body, are held 
as nearly motionless as possible in a specified position for a fixed 
time. 


Arm-hend steadiness is the most frequently used form of steadiness 
test. In the representative manual steadiness test a needle-stylus 
of constant diameter is inserted and held for a fixed time in each of 
a series of calibrated holes of varying size, drilled in a sloping 
panel. The stylus must be held toward the center of the hole in order 
to avoid actuating a counter which automatically records number or 
duration of contacts. Descriptions of commonly used versions of this 
test are to be found in Whipple (1914), Dunlap (1921), Garrett and 
Schneck (1933), Jones <t al (1941), Finan and Malmo (1944), and 
Melton (1947). Factors which vary from test to test are: (1) diameter 
and effective length of stylus; diameter and number of holes; (2) index 
of scoring, whether in terms of time or number of contacts; (3) distance 
of panel from subject; (4) illumination of holes; (5) angle at which 
stylus is inserted; (6) progress of the trials from small to large, 
or large to small, or randomized, with multihole tests; (7) subject's 
knowledge of results; (8) duration of trial period; (9) number of 
trials; (10) periodicity of trials; and (11) duration of intertrial 
periods. 


Diverse tests of stationary steadiness appear from Table 3 to 
show uniformly some degree of decrement under many of the conditions 
examined. With simulated altitude, five studies show a decrement 
beginning at 12,000 feet, becoming progressively more pronounced with 
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higher altitude or longer exposures (Malmo and Finan 1944; Eckman 

et al 1945; Otis et al 1946; Rahn et al 1946; Rahn and Otis 1947). 
The sensitivity of tests of this type to altitude appears sufficient 
to yield superimposed effects resulting from dietary changes 

(Eckman et al 1945). A steadiness test has also been shown by 

Otis et al (1946) to be sensitive to graded effects of hypocapnia 
induced by a pneumolator at 30,000 feet. Studies of sleep privation 
are consistent in showing no decrement (Cooperman et al 1934; 
Edwards 1941; Tyler 1947) except when the subjects actually fall 
asleep over the test. Carbon dioxide excess coupled with oxygen 
decrease has been shown by Consolazio et al (1947) to yield a 
Significant decrement, which, in the absence of controls, may be 
lnterpreted as an artefact of heavy breathing movements. Slight, 
though in some cases unreliable, decrements are shown as well under 
conditions of smoking (Hull 1924; Fisher 1927), carbon monoxide gas 
(Dorcus and Weigand 1929), hours of driving (Ryan and Warner 1936; 
Jones et al 1941), and dietary privations (Berryman et al 1947). The 
effect of noise on the steadiness function was indeterminate (Stevens 
1941), as was that of ‘verbal stress*t (Melton 1947). Vibration, 
according to Coermann (1939) results in little impairment, Three 
related studies on the influence of low temperature showed a highly 
variable response decrement which might be interpreted as a physio- 
logical artefact due to finger stiffening. Additional effects are 
summarized in Table 3. 


Reliability coefficients of steadiness have been generally 
reported as high. Kellogg (1932) gives an odd-even value of .94 - 
-98 (corrected); Paulsen (1935) reports sn odd-even value of .98 
(corrected) and a test-retest value of .73. The lower retest, value 
is attributed by this worker to random individual variations rather 
than to learning. Malmo and Finan (1944) have shown that reliability 
of a multiple, arm-hand steadiness varies with size of hole, ranging 
from .88 - .97 (corrected). Test-retest values according to these 
workers vary between .43 ~ .84. Practice, according to their findings, 
is practically negligible. Melton (1947), using a one-hole technique, 
has calculated intratest reliability values at .76 - .85 (uncorrected). 
The test design which appears to incorporate most features of control 
is that of Melton (1947); as an additional advantage, his version has 
been standardized on more subjects than other tests. An interesting 
variant of this test involving the simultaneous use of the two hands 
has been developed by the same investigator and show to be reliable 
under certain conditions of testing. 


While the basic nature of the steadiness function remains 
obscure, intercorrelational data serve to shed some light on its 
characteristics. Melton (1947) reports a high (r = .85) degree of 
relationship between steadiness scores of the two hands measured 
simultaneously. Seashore (1940) reports a high correlation between 
hand tremor and stationary steadiness. In opposition to this 
finding is Melton's (1947) evidence that Air Corps candidates who 
showed obvious signs of tremor, as observed clinically, made average 
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or betterethaneaverage scores on a steadiness test. The latter 
finding would appear to contrast the voluntary character of the 
steadiness function with the involuntary nature of tremor. Arn- 

hand steadiness is reported by Seashore (1940) to be positively 
related to other indices involving control of fixed movements, as 
postural sway, rifle muzzle sway, steadiness thrusting and target- 
shooting. Intercorrelational values of these measures with steadi- 
ness are .45 or higher. According to Melton (1947) correlations 

with complex coordination, two-hand coordination, discrimination 
reaction and finger dexter.ty are low. A factor analytic study 
reported by the same sourcs, in which other tests of dexterity were 
compared with steadiness (ime of contact), indicated that this 
function is loaded in a single factor which was not clearly defined. 
It was not loaded with dexterity or perceptual factors. Marking 

and dotting showed loadings similar to that of steadiness performance. 
According to the theoretical analysis of Brown and Jenkins (Fitts 
1947) to components of static reactions which appear to be most 
significant are "the relatively minute, high frequency tremor move- 
ments, sad the large, slow changes in static position". The chief 
importance of the present type of reaction appears a priori to lie 

jn its relation to other complex adjustments which it makes possible, 
The need for studies investigating the influence of such variables 

as body member, position of limb, knowledge of results, as well as of 
other general factors presumed important to motor functions, on static 
steadiness reactions is indicated. 


4, Tests of Body Sway 


Steadiness of the body as a whole is measured by the 
ataxiemeter. Of the several types of apparatus in fairly common use, 
the Miles (1922) ataxiameter is typical. This apparatus summates by 
means of automatic counters the anterior, posterior and lateral devia- 
tions of the head from a position determined by the fixed station of 
the subject. A modified ataxiameter developed by Edwards (1942) 
measures sway at hip and hand-arm level as well as at the level of 
the head. A more recent model (Fisher, Birren and Leggett 1945) 
simplifies the technique to measure anterior—posterior sway alone. 
Depending on the region of the body from which measurements are taken, 
differences in results may be found. In addition such factors as 
posture, height, weight, length of feet, presence or absence of visual 
cues and others must be accounted for in order to render comparable 
the results of different investigations. 


Among the results arrayed in Table 4, those obtained under 
timulated altitude yield a well marked decrement (Barach, Brookes 
et al 1943; Biren, Fisher et al 1946). Vollmer et al (1946) have 
shown that effects are no more pronounced with a combined condition 
of altitude and carbon monoxide than with altitude alone. Allied to 
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these results with altitude are the findings of Consolazio et al 
(1947) in which increased body sway was found as a result of 
extended exposures to COs excess and oxygen decrease in sealed 
chambers; a significant improvement was, moreover, noted upon 
recovery in normal air. Inconsistent findings tending toward a 
decrement, at least under extreme conditions, have been reported 
to result from sleep privation (Lee and Kleitman 1923; Edwards 
1941b; Cooperman et al 1934; Husband 1935; Tyler 1947). ‘Two 
studies of driving agree in showing increased sway (Ryan and 
Warner 1936; Jones et al 1941). Among studies of drug effects, 
both alcohol (Miles 1924) and the barbiturates (Tyler 1947) are 
reported to increase body sway, the latter in sleep-deprived 
subjects, while benzedrine (Tyler 1947) has been found, also 
under the condition of sleep privation, to decrease sway. Results 
obtained under noise of 115 db. were indeterminate (Stevens 1941). 
Increased variability of response has been independently remarked 
by a number of investigators working under diverse conditions. 
Intratest reliabilities reported by Seashore, Buxton and McCollom 
(1940) for the Miles technique are .80 (standing) and .89 (sitting). 


According to Fisher, Birren and Leggett (1945) the intratest 
reliability of their ataxiametric technique falls between .87 and 
-96 (4 min. and 8 min. test) under the conditions of study; the test- 
retest value is reported as .92. Their range of test scores appears 
adequate for sensitivity. Edwards (1942) reports his technique to 
be relatively insusceptible to practice effects. 


The relation between results obtained in the two postures of 
sitting and standing is given by Seashore, Buxton and McCollom (1940) 
as .07. These investigators have shown a moderate correlation of 
scores of standing subjects with tapping scores. Intercorrelations 
of ataxiametric scores with simple reaction time and serial discrimi- 
nation reaction (Seashore 1940) tend to be zero; and with pursuit 
rotor, highly negative (-.62). Studies of general motor functions 
reported by the same workers support the view of a group factor 
for steadiness, including body sway, interpreted as a precision 
factor since it includes a number of performances emphasizing 
accuracy of movement. 


5. Steadiness Aiming and Tests of Static Equilibrium 


The steadiness aiming test appears to differ from static- 
steadiness mainly in its greater emphasis on a movement factor which 
cooperates with steadiness in the joint determination of performance. 
In tests of the present sort a bodily member is moved from a fixed 
position to a prescribed position at a distance from the subject. 
Variations of the technique are distinguishable mainly on the basis 
of the bodily members and movement-coordinations required by the 
task. 
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The standard tests of steadiness-aiming may employ the same 
apparatus or one Closely similar to that used in the steadiness 
test. The subject may thrust a stylus or direct a pivoted stylus, 
into a hole or a series of holes. Scores are taken in terms of 
‘misses’ which are recorded by a counter or a cumulating device, 
actuated by contact of the stylus with the area immediately surround- 
ing the target. Descriptions of representative techniques are 
given by Whipple (1914), Dunlap (1921), Seashore and Adams (1933), 
Malmo and Finan (1944), and Melton (1947). Factors necessitating 
standardization are closely allied to those discussed under steadi- 
ness testing. The technique of Melton appears to incorporate the 
greatest number of features of control. 


With a single exception, use of tests of this type had been 
limited to the condttion of altitude, which has yielded consistent 
deficit in performance (Bagby 1921; Grether and Smith 1942; Gagne 
and Smith 1943; Loucks 1944; Malmo and Finan 1944). According to 
the findings of Malmo and Finan (1944) steadiness-aiming proved to 
be less sensitive as an index of anoxia than stationary steadiness. 
Impairment was not increased when subjects receiving sulfadiazine 
were subject to altitude. A decrement in performance was found by 
Ryan and Warner (1936) following prolonged driving of an automobile. 


Reliability coefficients reported for this test are substantially 
similar in value to those ostained with tests of static steadiness 
uncomplicated by aiming movements. Malmo and Finan give an intratest 
reliability range from .66 — .80 (corrected) for their version of the 
test. Somewhat higher values are given by Melton for his technique: 
092 ~- .96; however, reliability diminished from a test-retest value 
of .76 at ground to .32 under actual testing conditions involving 
altitude. That steadiness-aiming is somewhat more susceptible to 
practice than static steadiness is borne out by the data of Grether 
(1942) as well as by that of Malmo and Finan (1944). Grether further 
notes high variability of performance and a wide range of scores on 
his test. 


As a step toward interpreting the steadiness-aiming function, 
Melton has shown a high correlation (.66) between steadiness—aiming 
and stationary steadiness. It may be significant that the steadiness— 
aiming test proved to have low validity for the prediction of success 
in pilot training (Melton 1947); nor was its correlation with this 
criterion improved by adding *verbal stress’, a change which apparent- 
ly did not alter the character of the test. Intercorrelations 
reported by the same investigator between steadiness-aiming and other 
tests under altitude conditions were: .20 with single dimension 
pursuit test, -—.04 with peg moving, .21 with addition and .06 with 
code substitution. 


The factorial composition of the steadiness-aiming function 
appears to be similar to that of stationary steadiness (Seashore 1940). 
A final interpretation of the relationship between the two functions 
must, however, await more complete evidence, since tests have not, 
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for the most part, been designed to separate those components 
which on common-sense grounds appear to differentiate ‘static’ 
from ‘positioning’ reactions. 


Tests of Dynamic Equilibrium 


Tests falling under the category of dynamic equilibrium 
appear to bear somewhat the same relation to steadiness—aiming 
as tests of body sway do to static-steadiness. Examples are 
provided by the *Wobblemeter’ described by Hunt (1936) which 
involves keeping the balance while standing on a small platform, 
and the *stabilometer' developed by Travis (1944a). McFarland 
and Barach (1937) in attempting to distinguish between performance 
of normal and psychoneurotic subjects under anoxia, found the 
*wobblemeter’ to be unsuitable for the purpose since many of the 
subjects became too dizzy to take the test. 


According to Travis (1945) little relationship exists between 
measures of static and dynamic equilibrium. He also reports little 
or no correlation between stabilometer performance and pursuit. 
Center of gravity of the body is proposed as a unique factor 
measured by the present type of test, and not by tests of static 
equilibrium. 


6. Aiming, Spearing and Allied Tests 


Closely allied to tests of steadiness-aiming, is a second 
group concerned with accuracy of movement, which includes such 
varied performances as aiming at a target with a spear, dart throwing, 
rifle shooting, ball tossing, and three hole coordination. The 
hypothesis might be ventured that this group of varied tests differ 
from those of steadiness-aiming chiefly in their allowance of greater 
freedom of movements of the *positioning’ type. 


The Whipple Target Test (1914) involves striking at small 
crosses, randomly placed on a target, with a pencil held in the 
subject's hand. The Muscio Spearing Test (1922) differs mainly 
in the substitution of a small spear for the pencil and the use 
of concentric circle targets. In the dart-throwing test the subject 
attempts to hit the center of a target from a prescribed distance. 
Ball-tossing with both hands has also been used as a measure of 
complex aiming-throwing coordination. 


Except for the study of Jones et al (1941), who found a non- 
graded decrement in target-aiming after long hours of driving, 
and that of Tyler (1947), who demonstrated impairment on a prolonged 
rifle marksmanship test with severe sleep privation, results 
obtained with tests of the present type have been uniformly 
indeterminate or negative. Reliability appears to be difficult to 
achieve with these tests because of relatively large and continued 
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practice effects (Muscio 1922; Landis 1935). Seashore (1940) 

has reported correlations of rifle muzzle sway during sighting, 
and marksmenship, with indices of steadiness in highly practiced 
subjects. That these tests are highly susceptible to motivational 
factors, such as knowledge of results, competition, and standards 
set by the experimenter, has been demonstrated by Mace (1935). 

A possible advantage claimed for this type of test is, however, 
that they generate more interest than many other types of test. 


The *‘three-hole test’ is a variant of the present group of 
tests which requires that the subject insert a stylus successively, 
as rapidly as possible, into each of three holes arranged in a 
triangular pattern on an inclined panel. Scores are recorded 
automatically by counters each time a contact is made between hole 
and stylus. The few studies found which made use of this test 
showed the following tendencies: impaired performance with alcohol 
(Hollingworth 1923-24) and with 'razzing' (Laird 1923); indeterminate 
results with caffeine (Hollingworth 1912), and with administration of 
oxygen to dementia praecox patients (Hinsie, Barach et al 1934). 
Garrett and Schneck (1933) report that intercorrelations of scores 
on the three-hole test with other sensori-motor performance depends 
on the stage of practice; the correlation with tapping is reported 
to rise from -.25 for the first trial to .39 for the two hundred and 
fifth trial. Reliabilities for this test, ranging from .50 - .93, 
have been found in the literature (Stecher 1916; Garrett and Schneck 
1933). 


7. Tests of Manipulation and Dexterity 


Techniques classed under the category of ‘manipulation and 
dexterity* require combined precision and speed of performance. Such 
tests are limited to serial eye-finger, eye-hand, and eye-arm coordina- 
tions, or to combinations of these. Usually they involve a minimum 
of complication on the stimulus side. On the level of test operations, 
these may be grouped according to the type of material manipulated, 
as, for example, pegs, balis, blocks, and the like. 


In conventional peg-moving tests, a number of small pegs must be 
placed by the subject into a series of snugly fitting holes as rapidly 
as possible (Barach, Brookes et al 1943), or moved from one series 
of holes into another (Russell 1948), or removed, rotated and replaced, 
jn the same holes as in the Santa Ana finger dexterity test (Melton 
1947). The score is determined by the number of seconds required to 
complete the test, or by number of pegs manipulated in a unit time 
as well as by number of errors made. In a variant form of the test 
(SAM Peg-moving Test) reported by Melton (1947) a complication was 
added to the simple placing test requiring the subject to insert the 
triangular end of a peg into a hole of corresponding shape, remove 
the peg, rotate it through 180° and then to insert the other end, 
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which was pentagonal, into an appropriately shaped hole. Finally 

a push-button is vressed which lishts a lamp, sisnaling the completion 
of the unit-task. The score is obtained by taking the number of 
complete sequences in a given time. 


Whereas tests of the type described immediately above are self- 
paced; i.e., the subject determines his orm speed, a test in which 
the pacing is determined automatically by the apparatus was developed 
by Pollock and Bartlett (19232). Each time a constant sveed trolley, 
moving back and forth every few seconds, approaches the subiect, a 
peg is removed from or replaced on a board mounted on the trolley. 
In a more complex version, two trolleys on separate tracks move in 
alternate phase in relation to the subiect. The task is to remove 
a peg from one trolley and replace it in a hole on the platform of 
the other trolley, removal or replacement being verformed only when 
the trolley is in the position nearest to the subject. 


A test based on a somewhat different porinciple, involving 
coordinated movements of the two hands involves dropping a ball- 
bearing successively as rapidly as possible through a pipe held 
vertically in front of the subject (Brozek 1944). A mechanical 
counting device summates the number of times the ball passes from 
the top to the bottom of the pipe. Also based on manipulation 
of balls is the ‘psychomotor coordination test’ (Weiner and Hutchinson 
1945) in which the task-requirement is to vick a number of small balls 
off a rotating disc, with a pair of forceps. 


Manipulation of cubes is utilized in the Minnesota block test 
(Green et al 1945). The task-reauirement is to replace a‘number of 
blocks, one at a time, into a box designed to hold four layers of 
49 blocks each. Scores are derived from the number of units replaced 
in 90 seconds, or, in terms of total time consumed in replacing all 
of the blocks. 


Table 7 summarizes results obtained with maninulation and 
dexterity tests under the varied environmental conditions listed. 
Results at simulated high altitudes justify the conclusion that 
performance is impaired to a small extent under relatively extreme 
conditions (Grether and Smith 1942; Gagne and Smith 1943; Loucks 1944; 
Green 1947; Russell 1948). That these tests do not appear sufficiently 
sensitive to detect deficit with lower altitudes may be suggested by 
the findings of Barach, Brookes et al (1943), Smith, Seitz and Clark 
(1946), and Smith (1948). In line with such a view is Russell's 
(1948) finding of rapid compensation for the initial deficit during a 
55 min. stay at 18,000 feet. Jones et al (1941), using a “reach and 
grasp" test demonstrated a rapid and consistent decline in performance 
as hours of driving increased. Pollock and Bartlett (1932) demonstrated 
impairment resulting from regular noises occurring asychronously with 
the subject's operations while performing on the doubleetrolley test. 
Prolonged Vitamin-B deficiency (Brozek et al 1946),-and fasting 
(Taylor et al 1945) have been shown to be associated with impairment. 
Additional findings are given in Table 7. 
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Grether and Smith (1942) and Gagne and Smith (1943) give 
intratest reliability coefficients of .83 to .90 (corrected) for 
the SAM Peg-Moving Test. Intertest reliabilities for the same 
technique are reported as considerably lower, .50 (trials 1 and 2) 
and .28 (trials 2 and 3). For the Santa Ana Finger Dexterity Test, 
Melton (1947) reports an intratest value of .93, and high test- 
retest values. Corrected intratest reliability of the arm-hand 
coordination test is given as .87 - .89. Brozek (1944) has found 
that the intratest reliability of ‘the ball-pipe test ranges from 
-70 -— .80 (uncorrected). In view of the evidence considered above 
it appears that tests of the present type are reasonably consistent 
measuring instruments. 


Few clues are available as to the nature of the psychological 
functions sampled by these tests of dexterity. A battery of five 
tests of the present type analyzed by Melton (1947) yielded inter- 
correlations ranging from .28 - .6l1. In this study it was further 
demonstrated that there were no relationships between the tests of 
dexterity employed, and tapping, steadiness, discrimination reaction- 
time, and cancellation. A dexterity factor common to performance 
on various peg-moving tests has been suggested by Melton (1947). 


8. Path-tracing tests 


Presumably related to tests of manual dexterity but differing 
in the significant respect that they involve continuous movements of 
a less repetitive sort, are the path-tracing tests. In tests of this 
type the subject must move a stylus or pencil more or less precisely 
and continuously along a narrow slit bounded on both sides. Thus, 
continuously changing motor adjustments, in response to continuously 
changing patterns of stimulation, are demanded for proficient perform— 
ance on these tasks. Movements may be linear or curvilinear, discrete 
or cursive, toward or away from the body, to the right or left, with 
the preferred or non-preferred hand. Scores are derived from the 
number or duration of contacts made with the sides during the tracing 
of the length of prescribed course, or occasionally, by noting the 
first point of contact. For details of apparatus and technique 
employed in path-tracing tests, the reader is referred to Whipple 
(1914), Garrett and Schneck (1933), and Gurnee (1939). Certain 
simple maze-patterns (Vernon 1926) in which perceptual requirements 
are held at a minimum do not appear to differ in any important 
respect from other tests falling within the present group. Depending 
on the aspects of performance selected for analysis, ‘mirror-tracing' 
tests may properly be classified under the present heading, or may 
represent more complex types of performance to be discussed later 
in the report. (See tests of ‘change of set' and learning.) 


The 'rail-walking test! developed by Fisher and his co-wrlers 
(1946, 1947) may be considered analogous to the tracing test, although 
it differs from those considered above in that movements of the entire 
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body are involved. The subject is required to walk a raised rail 

1 inch wide and 10 feet long, placing heel to toe, and keeping the 
hands clasped behind the back. Performance is scored as the sum of 
distances walked in 10 trials. In the single study employing this 
test to detect possible decrement, Consolazio et al (1947) observed 
no impairment in performance with oxygen decrease and carbon dioxide 
excess in sealed chamber. Standardization data on naval personnel 
are given by Fisher (1946) who reports‘a moderately high intratest 
reliability, .77 (corrected) for first trial; .85 for second trial 
for the test, although a marked practice effect has been observed. 
Scores on the 'raile-walking test! as determined in the same study do not 
correlate with ataxlemetric scores obtained on 43 subjects. 


In Table 8 are shown, in summary, effects of a variety of 
conditions on path-tracing performance. McFarland and his co-workers 
(1937, 1937~II) have shown a decrement in mirror tracing to result 
‘from exposure to simulated altitude. Keeton and his co-workers 
(Keeton et al 1946; Glickman et al 1946; Mitchell et al 1946) have 
reported a decrement in performance in the cold but their reported 
evidence does not permit assessment of the influence of the additional 
factors of clothing and diet. In the acute phase of Vitamin-® 
restriction, Brozek's (1946) subjects showed impairment on a pattern 
tracing test, a result which was paralleled by Taylor et al (1945) 
with fasting subjects. Otherwise results obtained under such diverse 
conditions as sleep privation (Husband 1935), oxygen administration 
(Hinsie et al 1934), snd restricted diets, were mainly negative or 
indeterminate. 


Garrett and Schneck (1933) give the reliability of the standard 
Bryan tracing board as .93 as measured on extended performance of 
children. For his form of the tracing-board, Gurnee (1939) reports 
uncorrected odd-even reliability coefficients of .82, for performance 
with the right hand, and .77, with the left. The relatively great 
sensitivity of these tests to practice is generally agreed on by 
workers who have used it. That mirror-tracing is especially susceptible 
to learning is attested by its frequent use in experiments designed to 
investigate changes in behavior resulting from practice. 


Seashore (1940) obtained a correlation of .30 + .09 between 
precision of thrust, and steadiness in tracing a narrow V-slot. The 
representativeness of this single coefficient is, however, not 
established. No study of factorial composition of path-tracing 
performance has been found. 3 
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9. Dotting Tests 


In advence of empirical information which permits a more 
fundamental classification of ‘dotting’ behavior, it may tentatively 
be considered to have components of both steadiness-—aiming and 
pursuit performance. A distinction between dotting tests and path- 
tracing tests considered above, lies in the determination of the 
rate of performance by the apparatus, in the former, and by the 
subject himself in the latter. In the Schuster (Farmer and Chambers 
1926) modification of the McDougall dotting test, which is considered 
representative of this type, the subject inserts a stylus in and out 
of as many as possible of a series of small holes arranged in a 
sinuous pattern on a disc rotating in a horizontal plane. The holes 
appear in a small aperture which at first exposes only one, but with 
the increasing speed of presentation during the latter part of the 
trial, exposes several holes at a time. Scoring is made automatic 
by an electrically operated counter which summates the total number 
of successful insertions of the stylus into the holes during a three 
minute run. 


Examination of Table 9 reveals that dotting performance has 
proved sensitive to the condition of simulated altitudes of 15,440 
feet and higher, but showed no significant decrement at 9,200 feet 
(McFarland 1937 I, 1937 II, 1938). Further, an increased decrement 
is reported to result from ingestion of alcohol at altitude. Other 
results are summarized in the table. 


The intratest reliability of a dotting test investigated by 
Melton {1947) is reported as .95 (corrected) for the entire test. 
Susveptibility of dotting behavior to practice has been repeatedly 
observed. 


The dotting test has been shown by Melton (1947) to be only 
slightly correlated with pilot success. Farmer and Chambers (1926, 
1929, 1939) have shown a positive relationship between high scores 
on the test and low accident rate in driving as well as in certain 
industrial jobs. A fairly high correlation between dotting and 
two-plate tapping is reported by Melton (1947). 


10. Pursuit Tests 


Tests classed as ‘pursuit’ tests have in common the 
presentation of one or several visual displays, which move in one or 
more dimensions as determined by the apparatus, the subject, or both, 
and which must be followed or guided by the subject by means of 
continuous movements of the hands, feet or other members of the body, 
working singly or cooperatively. Pursuit tests, like those of 
dotting, are ‘apparatus paced'. A number of types of pursuit test 
are distinguishable on the basis of what appear to be significant 
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variations in the principles wnderlying the tests. Depending on 
whether or not the subject is enabled to control the motion of the 
display, the task is characterized as compensatory or non-compensatory. 
In the former type, the subject's efforts to compensate for deviations 
of the target from a prescribed position influence its behavior; in 
the latter case, movement of the target is determined entirely by 

the nature of the mechanical arrangements of the apparatus in 
independence of the subject's efforts. A second difference between 
pursuit tests relates to the number of spatial dimensions through 
which the target moves: whether in a plane to the right or left, 

or up and down, or forward and backward, or any combination of these. 
Pursuitmeters are classified as single- or multiple—dimensional 
according to the degrees of freedom of movement of the presentation 
and of the controlling devices. A third important way in which 
versions of pursuit tests differ is in the rate of movement of the 
display: whether at a uniform or periodic, or at a variable or 
aperiodic rate. Within the ‘variable* type of pursuit test a further 
distinction is made between ‘trate’ pursuit tests. in which the speed 
of movement of the *follower* within a non-—compensatory pursuit task, 
varies with the magnitude of the subject's response, in contrast 

with the ‘direct’ type in which the rate of movement of the follower 
remains constant. A distinction must be made, too, between tasks 
which require the use of a single hand and those which depend on the 
simultaneous coordination of both hands. While it would be misleading 
to urge that these differences as listed complete the possible 
varieties of pursuit tests, it is believed that they are sufficient 
for purposes of preliminary analysis. The following descriptions 

of tests exemplify the distinctions made immediately above: 


(1) Non-compensatory, uniform, pursuit. The widely employed 
Koerth pursuit rotor falls within this category. The subject is 
required to keep a hinged (pressureless) stylus in contact with a 
small brass target. embedded near the outside rim of a rotating disc. 
Scores are derived from the number of fractional revolutions during 
which the stylus is held on the target. The SAM rotary pursuit 
test is an adaptation of Koerth's test differing mainly in size of 
target, size and kind of disc, and rate of rotation. Differences 
in performance on the two tests are discussed by Melton (1947). 


(2) Non-compensatory, variable, direct, pursuit. The 
principle of moving a visual target at an unpredictably variable 


rate of speed is the basis of a test reported by Farmer and Chambers 
(1926). A pointer controlled by the subject must be controlled in 
such manner as to keep it in line with a second pointer whose 
direction and rate of movement are determined by an irregular cam. 
Deviations of the *follower' from the target are cumulated automa- 
tically to yield a score. Modifications of this technique have 
been employed by McFarland and his co-workers (1932, 1936). 
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(3) Non-compensatory ‘rate’ pursuit. HExamples of this are 
reported by Melton (1947). Since the task involved in these tests 
appears to represent a highly specialized type of function, the 
rationale of which lies in its presumed similarity to the job of 
flying, it is not described here in detail. The Skilled Response 
Test, a composite task with a ‘rate pursuit’ component has been 
utilized in decrement-testing by the Cambridge group (Davis 1948) 
and is discussed later. (See Table 25) 


(4) Non-compensatory, twoehend, pursuit, The SAM TwosHand 
Coordination Test (Melton 1947) differs from pursuit tasks described 


up to this point in the requirement that the two hands be employed 
simultaneously in order to perform the task. Two lathe-like control 
handles are manipulated to keep a target follower on a visually 
perceived target moving at varying rates along an irregular pathway. 
Time spent off the target is cumulated by a clock to yield a score. 
Alternative scoring pessibilities, such as "smoothness of control’, 
and an ‘activity tension measure’ have been investigated by Melton 
(1947) and found lacking. 


(5) Compensatory, single-dimension pursuit. In the Miles 
(1921) Pursuit Test the task required of the subject is to compensate 


for changes in an electrical circuit, observed by the subject as 
movements of a needle, by moving the slide of a rheostat either to 
the right or left in order to keep the needle in line with a mark 

on a screen. The amount of deviation from the zero-point is 
integrated by wattmeters in arbitrary units. A recent modification 
of Miles' technique which is stated to be simpler and more dependable 
is the SAM Single-Dimension Pursuitmeter, reported by Melton (1947). 


(6) Compensatory, multiple-dimension pursuit. The SAM 
Artificial Horizon Pursuit Test (Melton 1947) provides an example 
of a compensatory task requiring movements up and down as well as 
to the right or left. To perform this task the subject must»rotate 
and simultaneously move in and out a single control-wheel, in such 
a way as to keep an ‘indicator bar’, drifting up and down and 
rotating about its center, in line with two fixed marks. In the 
SAM Multidimensional Pursuit Test a third dimension of movement 
is udded. A panel viewed by the subject contains three instrument 
dials, the pointers of which are movable by means of two controls - 
a 'stick' and a 'throttle’. The stick is displacable in the two 
plane dimensions, controlling two of the dials; the throttle, which 
is moved forward and backward, controls the third. The subject 
manipulates the controls in such a manner as to maintain the needles 
at their zero points. A comparable multidimensional apparatus with 
a different display is reported by Stevens (1941). 


A further complication of possible theoretical interest is 
a Pursuit-Steadiness Test developed by Farmer and Chambers (1929). 
In this test the subject is required to hold a stylus, with a ball 
on the end, inside a small metal cup which is moved irregularly in 
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speed and direction. Scores are determined by the duration of 
contacts of the stylus with the sides and bottom of the cup. 

A type of non-compensatory pursuit test differing from any described 
above adds the factor of physical work by utilizing a control lever 
which can be weighted from 2 to 40 pounds. A brief account of this 
test is given in Mackworth (1945, 1948). For a further discussion 
of apparatus and details of administration of pursuit tests, 
reference is made to Seashore (1928) and Melton (1947). 


Since no marked functional differences are apparent in the 
results obtained with the several types of pursuitmeter (see Table 
12), it seems justifiable to group them together for purposes of 
present consideration. Investigations of altitude consistently show 
decrements in performance with all types of pursuit-tests (Grether 
and Smith 1942; Gagne and Smith 1943; Loucks 1944; Green 1947; t 
McFarland 1932, 1938; Barach, Brookes et al 1943; Brooks 1945). 
Further, the pursuit test appears to be sensitive enough to detect 
effects of dietary and other conditions superimposed on altitude. 
(Green et al 1945; Brooks 1945). Performance under the condition 
of alcohol showed a decrement (Miles 1924; McFarlend and Barach 
1936). Two minor studies of sleep reduction report negative 
or indeterminate results (Laslett 1928; Husband 1935). Noise 
yielded no performance deficit as measured by a test of the present 
type (Stevens 1941). Other findings obtained under miscellaneous 
conditions are reported in Table 12. 


Intratest reliability coefficients for the Koerth pursuit rotor 
are given as high (.92, corrected) by Seashore, Buxton and McColivm 
(1940), and by Melton (1947), for the SAM modification, (.93 - .96, 
corrected). The immediate test-retest reliability coefficient is 
given as .88. In the same report the intratest value for the Miles 
compensatory pursuitmeter test is given as .74 (corrected, .90); 
the fact that calibration of this apparatus proves difficult may 
mean that the intertest values would be considerably lower. On the 
SAM modification of Miles’ test similar intratest values were found: 
273 - .85 (uncorrected), and .84 - .92 (corrected). Test-retest 
reliability values obtained under deficit-producing conditions showed 
considerable loss in the consistency of measurement: at ground level, 
.75, but at altitude, .19. These results on loss of test-retest 
reliability were confirmed by Green (1947). For the Artificial 
Horizon Test, a two-dimension pursuit task, values of .95 - .97 
(intratest, corrected) and .85 (test-retest, trials 2 and 3) are 
reported. Reliability of a rate pursuit test (SAM Rate Control Test) 
is stated to be relatively low (Melton 1947). Insufficient use of 
compensatory and non-compensatory pursuit tests under the same 
conditions of testing precludes any statement of comparison of their 
reliabilities. Investigators are in agreement that pursuit tests 
are influenced by practice effects. 
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The relatively small amount of intercorrelational data, 
coupled with the wide variety of pursuit tasks employed, precludes 
eny final statement about the generic characteristics of the tests. 
However, it does appear that all compensatory pursuit tests show 
the same pattern of intercorrelation with other tests. The SAM 
Single-Dimension Pursuit Test, the Artificial Horizon Test, the 
Stevens Coordinated Serial Pursuit Test and the Multidimensional 
Pursuit Test all correlate with other types of tests as follows: 
moderately high with the SAM Complex Coordination Test, Two-Hand 
Cvuordination Test and the SAM Rotary Pursuit Test; lower with 
Discrimination Reaction Time; and practically not at all with tests 
of finger dexterity and steadiness-aiming (Melton 1947). The rotary 
pursuit test correlates moderately high with Complex Coordination 
and Two-hand Coordination, but also correlates, to a greater extent 
than those listed immediately above, with finger dexterity and 
steadiness-aiming. Thus, relatively high correlations among the 
several types of pursuit tests have been demonstrated. Pursuit 
tests appear to be relatively heavily loaded with a single factor, 
which Melton names ‘coordination', since it also appears in such 
tests as Complex Coordination. Data presented by Seashore (1940) 
suggest low correlations between the rotary type pursuit test and 
the simpler motor functions of reaction time and tapping. It is 
also significant that tests of the present type show little relation- 
ship with paper and pencil tests in general. Pursuit tests have 
further been shown to have appreciable validity for the prediction 
of pilot success (Melton 1947). 


11. Discrimination Reaction-Time Tests 


In the simple reaction time test, previously considered, a 
single response is made to a single stimulus. The discrimination 
reaction time test is more complex in that one of two or more responses 
must be made, in correspondence with two or more dissimilar stimuli. 
The chief dimensions alone yhich such tests may vary are with respect 
to kind, number and complexity of both stimulus and response, as well 
as with respect to the relationships of the several stimuli to each 
other (Woodworth 1938). Differences in sensory modality, in quality, 
quantity, form and position of the stimuli occur within the variety 
of tests employed for measuring discrimination reaction time. Within 
any given test of visual discrimination reaction, the most commonly 
used type, the differential character of the several stimuli may be 
based on color, intensity, form or position, or on combinations of 
these. Responses likewise differ from test to test, and may be 
simple or complex, or may involve different reaction systems, as the 
hand, feet, or the entire body. Within a given test, the response may 
involve different movements of the same member, relatively similar 
movements of different members, or releasing movement to one stimulus 
and withholding it to others. Further, response may be discrete or 
serial. Lac of standardization in many of these respects, all but 
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precludes comparison of results obtained with different tests. 
Representative techniques for the measurement of discrimination 
reaction time are as xollows: 


Visual choice reaction requiring response (pressing a key) to 
a@ green light and withholding of response to a red light has been 
employed by Lee and Kleitman (1923), among others, as noted in 
Table 11. In a more complex form of what appears to be basically 
the same test, the subject must respond to only one of five lights, 
inhibiting response to any one of the remaining four wheg,it is 
flashed (Tuttle, Wilson and Daum 1949). 


A second type of discrimination reaction test differs from 
the choice reaction, described immediately above, in that response 
is not withheld to any stimulus, but, on any given trial, one of 
several possible responses is made, depending on which stimulus 
appears. Thus Hollingworth (1912) has made use of a test in which 
the subject responds with the right hand to a red light which 
appears on one side of a panel, and with the left hand to a-blue 
signal which always appears on the other side. Tests of this general 
type may be further distinguished depending on whether response must 
be based on both position and color, as in Hollingworth's test, or 
on color alone. In the former case, a light of given color appears 
invariably in the same position; in the latter, lights of any given 
color appear in any of the several positions on different trials (or 
lights may appear in the same single position on different trials.) 
No technique of the latter sort was noted, although its relation to 
other types of discrimination-reaction remains a problem of possible 
systematic importance. A purely positional type in which the cues . 
made use of are of identical color but are presented in two different 
positions, has been employed by Seashore, Starman et al (1941). In 
this situation the subject lifts the appropriate hand, right or left, 
from either of two telegraph keys, depending on which one of two red 
lights, on the right or left, is presented. 


The color-positional response has been complicated by the 
addition of more stimuli and response possibilities in the Sillitoe. 
(1921) apparatus employed by McFarland (1932 and series). The 
subject must respond by pressing a key corresponding to the lighted 
member of five differently colored lights. A variation, involving 
response of the entire body, of this type of discrimination is 
reported by Keys and his associates (1945 and series). 


A third type of discrimination reaction test involves response 
to visual stimulus patterns differing from one another with respect 
to the spatial arrangements of their component parts, some of which 
are common to several pregentations. The SAM Discrimination Reaction 
Test (Melton 1947) requires that the subject react by pushing one 
of four toggle switches in response to the simultaneous lighting of 
a red and a green signal lamp. The position of the red light with 
respect to the green rather than the onset of a particular light, 
determines which of the four switches is the correct one. The subject 
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is instructed in the technique of correct response, although it has 
been suggested that a test requiring the subject to determine the 
correct response for himself would be interesting. In the standard 
SAM test, time required to operate the correct switch on each of a 
series of trials is accumulated on an electric time clock to yield 
the score. 


A mixed form of discrimination reaction test involving various 
techniques has been reported by Farmer and Chambers (1926). Varia- 
tions tried out with the apparatus involve use of different sensory 
modalities, varying numbers of stimuli (2 - 6), varying posftions 
of keys to which response must be made, prescribing response only 
when piesentation is preceded or followed by another stimulus, basing 
responses on coded patterns and a number of others. In its final 
form, this test was retained as a choice reaction to six stimuli 
involving different sensory modalities. 


Another type of discrimination reaction in which both stimulus 
and response are highly complex is the SAM Complex Coordination Test 
(Melton 1947). This test was developed to measure the ability of 
individuals to make control movements of an aeroplane type stick and 
rudder in response to successively presented combinations of visual 
Signals. The subject is presented with three double rows of lights, 
one row of each pair being red, and the other green. A pattern of 
three red lights, one in each of three rows, calls for a coordinated 
response (or successive responses) of hand and foot to light the 
corresponding green lights in the response row. After the match has 
been obtained and held for a brief period, a new pattern of lights is 
presented for another response. Score is either the number of patterns 
matched correctly in a fixed time or time required to complete a fixed 
number of patterns. A large number of variations have been introduced 
into the conditions of this test, at least some of which carry it 
beyond the scope of discrimination reaction. 


A further important distinction among discrimination reaction 
tests may be made in terms of their discrete or serial character. 
In the Seashore Discrimeter (Seashore 1928) the rate of response of 
the subject determines the rate of presentation of the subsequent 
stimuli. The subject is presented with one of four stimuli seen 
through an aperture. When the appropriate one of a bank of four keys 
is pressed, a tachistoscope is actuated, producing the next stimulus. 
Score is total time required to respond to a given number of signals. 
The Psychergometer of Bills (1936) appears to be based on a similar 
principle, but with the important difference that the aspect of 
performance selected for scoring is number and length of errors and 
*blocks’ rather than time. 


A variant form of discrimination-reaction test, which differs 


from those just described in two presumably important respects, is 
that of Dockeray (1922). The subject is presented with one of five 
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lights of different intensity; and is required to press a key 
corresponding to it. Since only one light is present at any given 
moment, discrimination must be made on the basis of absolute cues. 
Secondly, the discrimination of stimuli, differing only in degree 
of intensity, and similar in all other respects, probably makes the 
discrimination more difficult than in other tests considered. 


Results obtained with all types of discrimination-reaction 
tests are arrayed in Table 11. The findings support the statement 
that this type of test is sensitive to altitude and allied conditions 
at approximately 15,000 feet and above (McFarland 1932, 1937-1, 
1937-II, 1938; McFarland and Dill 1938; Wespi 1933, 1936; Bills 1937; 
Bagby 1921). The findings of Gagne and Smith (1943) need not be 
interpreted as inconsistent with this generalization in view of the 
relatively short exposure (15 minutes) at 18,000 feet. Rahn, Otis 
et al (1946) report decrement in response to acapnia induced with a 
pneumolator at 30,000 feet, and McFarland (1938) has reported less 
impairment at altitude when 3% COs is added. Increased atmospheric 
pressure is reported by Shilling and Willgrube (1937) to impair 
discrimination reaction time performance. A majority of studies of 
fatigue and related conditions (Lee and Kleitman 1923; Cooperman, 
Mullin and Kleitman 1934; Patrick and Gilbert 1896; Husband 1935) 
agree in reporting negative results. Bills (1937) and Tyler (1947), 
however, report decrement with more protracted conditions of testing. 
Noise and vibration showed no clear cut effects (Dockeray 1922; 
Baker 1937; Stevens 1941; Lewis 1943), although Taylor (1935) has 
demonstrated some depressive influence of ‘startle* produced by 
loud noises. Under the condition of stress induced by air inter— 
ruption, Melton (1947) has reported a deficit on the Complex 
Coordination Test. Keeton et al (1946) and Mitchell et al (1946) 
report decrement resulting from cold. Results on effects of diet 
and drugs are also summarized in the table. 


Reliabilities reported for discrimination-reaction time differ 
for the several types of test. For a choice discrimination-resaction, 
Sisk (1926) has calculated a reliability of .64. Seashore, Starman 
et al (1941) give a range of coefficients from .83 to .89 for a visual 
positional discrimination reaction test. The intratest reliability 
of the SAM Discrimination Reaction Test has been determined to vary 
from .87 - .93 (corrected), and the test-retest value, as .78 
(Melton 1947). The same investigator gives a coefficient of .89 
(intratest, corrected) and .87 (test-retest) for the Complex 
Coordination Test. A number of alterations introduced into the 
test conditions, such as adding an additional task (materials to be 
memorized), reduced the consistency of measurement of the test 
(r*s range from .50 - .79). If the reliability value of .95 
(intratest corrected) given for a serial discrimination reaction 
time test by Melton (1947) may be assumed to be typical, it is 
important to note the gain represented in the self-paced type of 
test over the apparatus-paced type. In support of this point, 
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is the high intratest reliability, .93 (corrected), offered by 
Seashore for the Discrimeter (Seashore, Buxton and McVollom 1940). 
Farmer and Chambers (1926) report sufficiently high intercorrela- 
tions between various types of complex reaction time tests to 
justify substituting a standard choice reaction for other more 
complex variations. 


The psychological functions underlying performance on these 
tests is elucidated only by fragments of intercorrelational evidence. 
Sisk (1926) has given the interrelationship between simple visual 
and choice reaction times as .67. Lanier (1934), using two highly 
similar tests reports a perfect correlation between a choice and a 
color-positional discrimination reaction test. The SAM Discrimination 
Reaction Test is stated by Melton to have a low (.23 - .43) inter- 
correlation with the Complex Coordination Test. The SAM Discrimination 
Reaction Test is correlated with the serial form of the same test 
(SAM Self-pacing Discrimination Reaction Test) to the extent of .58. 


Sisk (1926) reports a simple visual choice to correlate with 
cancellation to the extent of .13, with ‘making lines’, .30, and with 
simple reaction, .67. Performance on the Seashore discrimination © 
reaction test is reported to have a .06 correlation with steadiness 
(Seashore 1940). Melton (1947) finds positive intercorrelations 
between the Discrimination Reaction Test and the SAM Two Hand 
Coordination Test, and a finger dexterity test. The Complex Coordinae- 
tion Test is reported by the same source to be related with other 
tests as follows: with SAM Rotary Pursuit, .34 -..41; with a finger 
dexterity test, .22 - .35; and with a steadiness test, .12. 


It may be significant that the SAM Discrimination Reaction Test 
has considerable validity for the prediction of all three types of 
aircraft personnel (pilots, navigators and bombardiers). Factorial 
analysis of this test shows it to be heavily loaded with a function 
measured by a number of paper and pencil tests, and to be high ina 
"psychomotor precision’ factor. The Complex Coordination test 
appears to be heavily loaded with a *coordination' and a ‘perceptual’ 
factor. As Melton suggests: "Future research desi.zned to determine 
why this test has such ubiquitous validity should lead to a better 
understanding of the psychological functions which must be measured 
in a test used for the selection of aircraft pilots." (Melton 1947, 
pe 176). 


12. Naming Tests 


Rapidity in naming colors or forms presented in rapid 
succession to the subject, although often classified as an association 
test, has clear affinities with descrimination reaction time considered 
in the preceding section. In tests of the present type the emphasis 
appears to fall less on the ‘symbolic’ aspect of the verbal response 
than it does on the purely motor side. On this assumption there 
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appears to be little essential difference between the two tests of 
Bills, for example, which are in virtually all respects identical 
except that in the color-naming test response is made to a voice key, 
and in the Psychergometer, to a manually operated key. What may 
serve to differentiate the form and color-naming tests from complex 
reaction times is the aspect of behavior selected for scoring; errors 
and 'blocks', which are usually not exploited by reaction time tests, 
are given more weight than time of reaction in the tests presently 
under consideration. 


In a standard form of the color-neming test, colored stimuli 
are presented in rapid succession to the subject one at a time through 
a small aperture. The subject responds by speaking the names of the 
colors into a voice-key. Results obtaired with a revision of the 
older color-naming test in which manuel response keys are substituted 
for voice-keys, sppear to parallel closely those taken with the original 
verbsl response form (Bills 1936 and series). The forbear of both 
of these tests is the Woodworth-Wells (1911) color-naming test which 
involves the use of a card containing 100 colored squares - yellow, 
blue, black, red and green, arranged in ten rows end ten columns, in 
random order. The subject is instructed to name the colors as rapidly 
as possible, and the total time in seconds and errors are recorded. 
This older form of the test, although it appears to offer less 
possibility of control than that of Bills has been more widely used 
in studies of deficit. 


Impairment in performance on color-naming tests under conditions 
of altitude is reported both in terms of response-latencies and errors 
(blocks) by Bills (1937) and by McFarland (1937-1, 1938). These data 
suggest that number and duration of blocks is a more sensitive index 
of the present type performance than response time alone. Increased 
blocking has also been shown by Lee and Kleitman (1920) and by Warren 
and Clark (1927) to result from sleep privation, but a third study 
of this factor (Cooperman, Mullin and Kleitman 1934) yielded no deficit. 
Alcohol, according to the studies of Hollingworth (1923-24), impairs 
performance, while caffeine facilitates it over the normal level (1912). 
Results obtained under other conditions are included in Table 12. 


That high reliability estimates for color-and form-naming have 
been obtained may be inferred from the work of Lanier (1934). Inter- 
correlation between color-naming snd form-naming is reported by 
Garrett and Schneck to have a value of .73. Lanier (1934), using 
speed as a measure of response has demonstrated a fairly high average 
intercorrelation (.57) to obtain between two tests of form-naming and 
two of color-naming. The same worker gives intercorrelations between 
form- and color-naming and other tests as follows: simple reaction 
time, -.35 to - .45; simple discrimination reaction time, -.38 to 
-.49; cancellation, .20, substitution, .42, simple card sorting, .16, 
and complex card sorting, .67. 
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13. Card-Sorting Tests 


These tests have in common the requirement that the 
subject indicate discrimination between a series of cards of 
different kinds by placing them in a number of designated positions. 
The nature of the cards to be sorted differs from test to test, some 
utilizing playing cards, and others picture cards of various sorts. 
Likewise the number of categories into which the cards are to be 
sorted varies widely, in the tests cited, from 4-30. Discrimination 
is commonly based on visual cues; however, in some tests the subject 
must distinguish between the cards tactually, on the basis of holes 
of different patterns punched in the cards. Probably most signifi- 
cent among the variables in tests of the present sort are differences 
in instruction given to the subject, for, depending on these, as well 
as on the indices of performance selected for measurement, this test 
may become one of reversal of set, of learning, or of other functions 
presumably not sampled predominantly by the standard card sorting 
test. Scores may be derived from time consumed in completing the 
task, from errors and blocks, or combinations of these. 


Examination of results arrayed in Table 13 shows that altitude 
is the only condition investigated with tests of the present type 
which reveals a decrement, and then only at extreme altitudes 
(West et al 1944; Gerstell 1946; Hoffman et al 1946). Bagby (1921), 
using a rebreathing apparatus, and Lowson (1923) failed to obtain 
significant decrement at lower altitudes. No deficit in performance 
is reported for such varied conditions as 'fatigue' (Johnson 1922) 
(Husband 1935), smoking (Carver 1922) and noise (Stevens 1941). 


Card sorting, according to findings summarized by Garrett and 
Schneck (1933) has a reliability range of .72 - .98, depending on 
the characteristics of the tests. It is noteworthy that the several 
versions of the test analyzed by Tinker et al (1932) all proved to 
be highly reliable. It has further been demonstrated that variability 
of response, speed of performance, and progressive learning are all 
influenced by changes in motor sequences and in complexity of the 
discrimination required of the subject. Card sorting tests are, 
according to general agreement, highly influenced by practice. 


On intuitive grounds card sorting tests are stated to measure 
speed of discrimination and reaction. According to evidence 
presented by Garrett and Schneck card sorting correlates low both 
with tests of physical capacity, at one extreme of performance, 
and with those of general intelligence, at the other. 
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14. Cancellation Tests 


Cancellation tests have the common requirement that the 
subject examine a series of stimuli (words, numbers or geometric 
forms) indicating, with maximum speed and accuracy, certain designated 
items as they recur. A time limit is usually set on the task and the 
score taken in terms of number of items cancelled during the time 
limit, with or without reference to errors. 


Variations among cancellation tests involve such factors 
as: number and distribution of elements to be cancelled, degree of 
meaningfulness of material, duration of test, and, principally, the 
nature of the instruction given to the subject, whether simple or 
complex. For an account of standard cancellation procedures the 
reader is referred to Garrett and Schneck (1933). Recent improve- 
ments in the construction of tests of the present type have 
emphasized uniform stimulus elements, equal availability of items 
to be cancelled, and increased control of the degree of meaningful- 
ness of the material (Finan 1942; Weston 1945). 


Results obtained with cancellation tests under a variety of 
conditions are summarized in Table 14. Of two investigations of 
anoxic effects listed, both show a decrement in performance 
(Gellhorn 1937; Gellhorn and Joslyn 1937). Of two studies of the 
effects of noise, one yielded positive (Burris-—Meyers et al 1942) 
and the other less determinate results (Obata et al 1934). With 
the exception of a deficit in calcellation performance under the 
conditions of dietary deficiency (Guetzkow et al 1946) the remaining 
results obtained under conditions listed in Table 14 are either 
uninterpretable or negative. 


Reliabilities, which may be assumed to differ, depending on 
the type of test, have been reported for the Woodworth-Wells versions 
of number anc letter cancellation as .76 and .80, respectively 
(Garrett and Schneck 1933). Whipple (1914) states that reliability 
coefficients range from .60 — .97 for various forms and lengths of 
the test. In a more recent study Travis (1947), using the Whipple 
letter cancellation test, obtained a reliability coefficient of .86 
(trials 1 and 2). Effects of practice on the test have been noted 
by a number of investigators. 


The psychological functions sampled by the test have been 
variously labeled, on intuitive grounds, as ‘attention’, ‘rate of 
perception’ end ‘discrimination’. It seems clear that although a 
minimum of motor skill is demanded by cancellation, the test is not 
primarily one of motor ability. Travis (1947), in the same study 
cited above, reports intercorrelations of the Whipple letter 
cancellation test with other tests as follows: with accomodation 
and convergence, .39; with motor speed as tested by a ‘reach and 
turn’ test, .44; with visual acuity as tested by the Snellen test 
chart, .19. Guilford (1947) on the basis of a factorial analysis 
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suggests loadings of a perceptual and verbal-intellectual sort. 
In line with such an interpretation, Garrett and Schneck (1933) 
point out that cancellation shows a "fair degree of correlation 
with those mental tests which demand quickness, such as analogies, 
completion and word building." 


15. Substitution Tests (Code) 


In code tests the common characteristic is the requirement 
that one set of characters be substituted for another set in-sscor- 
dance with prescribed instructions. A significant difference between 
the several types of substitution test appears to be the type of 
material substituted: whether letter for letter, digit for letter, 
digit for symbol, or other. Thus the Woodworth-Wells (1910-11) code 
substitution test, the forbear of such tests, requires that the 
subject respond by substituting digits for geometrical forms. The 
Johnson (Johnson and Paschal 1920) version, which has probably 
received widest use among tests of the present sort, requires the 
substitution of one letter for another. In this test, the top line 
on a printed page gives the alphabet, and a line immediately below 
it, the alphabetical code to be substituted in the materials below. 
The materials consist of five short lines to be transliterated. 
Twenty different forms of the test, presumably identical in degree 
of difficulty, are available for use. The subject is instructed 
to transliterate the 50 letters on the page as rapidly as possible. 
Scores may be based on time, or errors, or on both. Additional 
factors which differ from test to test are: degree of meaningful- 
ness of material to be transliterated, number of units to be 
substituted, availability of key to the subject (whether initially 
only, or throughout the tegzt). The last mentioned variable appears 
to be an especially important one, since it illustrates clearly how 
a difference in procedure, within otherwise similar tests, alters 
the nature of the psychological function involved; in the present 
case from a test heavily loaded with *memory’ to one in which that 
factor is minimized. Details of construction and administration 
of tests of the present type are given in Whipple (1914), in 
Garrett and Schneck (1933) and in Melton (1947). 


Results arrayed in Table 15 show that substitution performance 
is impaired under conditions of altitude (Johnson and Paschal 1920; 
Lowson 1923; McFarland 1937-I; 1937-III; 1938; McFarland and Dill 
1938; McFarland and Edwards 1937; Knehr 1940; Malmo and Finan 1944; 
Brooks 1945; Grether and Smith 1942; Gagne and Smith 1943). Some 
evidence indicates, too, that the test is sensitive to other 
conditions added to altitude as 3% carbon dioxide (McFarland 1938) , 
methylene blue (Brooks 1945) and diet (Eckman et al 1945). However, 
McFarland (1938) has shown recovery in performance after an hour's 
stay at altitude, Malmo and Finan's (1944) data showed no impairment 
in amount of work, but only in errors, and Loucks’ (1944) evidence 
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indicates an unreliable decrement with a brief stay at altitude. 

Other decrements have been reported with alcohol (Hollingworth 1923-24; 
Miles 1924; McFarland and Barach 1936), sulfanilamide (Roughton et al 
1941) and Vitamin B deficiency (Berryman et al 1947). Cold tempera- 
tures are reported by Horvath and Freedman (1947) to induce decrement 
due to loss of finger dexterity. Absence of impairment is reported 
for a number of conditions, including noise (Stevens 1941; Burris- 
Meyers et al 1942), which are summarized in the table. 


Reliabilities are given by Garrett and Schneck (1933) for the 
Woodworth-Wells digit-symbol test as .70, and for another similar 
test as .78. Melton reports the test-retest reliability of the SAM 
Substitution Test to be .79 as determined on the ground, but only 
-.22 at altitude. This study (Melton 1947) concurs with other 
investigators in finding the code test somewhat susceptible to 
practice. In the same studies, however, it has been shown that the 
test employed was both stable and sensitive, with performance 
levelling off fairly rapidly. 


Code-substitution has classically been called an ‘association 
Test*. A number of studies have demonstrated a high degree of 
correlation between code substitution and intelligence tests. 
Garrett and Schneck (1933) call attention to the motor factor in 
highly practiced performance, when the test may become little more 
than one of speed of writing. Melton (1947) gives correlations 
of code-substitution with other tests as follows: with SAM Single 
Dimension Pursuit, .33; with SAM Steadiness Aiming, .06; with 
SAM Peg Moving Test, .08; and with an addition test, .08. Results 
appear to demonstrate consistently little relationship between 
substitution tests and those of simple motor function. McFarland 
(1938) considers that the Johnson test measures a fairly wide range 
of psychological functions including attention, accuracy, adjust- 
ments of accommodation and convergence, and writing. 


16. Computation Tests 


Computation tests require the solution of simple and complex 
problems in addition, subtraction, multiplication and division, singly 
and in combination. In some tests a pencil or other aids may be used 
in determining the solution, in others the problems must be solved 
‘mentally’. The Woodworth-Wells (1910-11) test, which has been widely 
used, requires the continuous ‘mental’ addition of 100 two-place 
numbers presented in four columns of 25 each. In a different form of 
the test the subject is instructed to add a constant amount to each 
of the two-place numbers. In a more complex form of mental computa- 
tion Macht and Macht (1939) demanded the addition of a constant amount 
to a two-place number which is then multiplied by a second constant 
number; a third constant number must then be subtracted. A pencil 
and paper addition test requires the subject to add digits from left 
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to right until the sum equals a designated number. In a recent 
modification dexcribed by Melton (1947) the problem consists of a 
number of horizontal rows of figures, each one constituting a 
single problem. Four to eight figures are added consecutively 
until the sum is equal to an underscoréd number at the left of 

the line. The subject draws a line after the last number used to 
obtain the sum. In any strict sense a comparison of results 
obtained with tests of the present type would require control of 
many factors, usually neglected in the literature, such as (1) type 
of mathematical manipulation involved; (2) length of test; (3) hori- 
zontal or vertical position of figures; (4) number of elemerts ‘n 
each problem; (5) complex requirements such as adding constants, 
doubling the sum, and the like; (6) aspect of performauce selected 
for measurement, whether speed or accuracy or both, as is the case 


with most computation tests. 


In Table 16 are summarized effects of a large number of 
different conditions on performance with diverse computation tests 
taken as a group. It will be observed that altitude and allied. 
conditions are reported to yield decrement in computational perfor 
mance (Gellhorn 1937; Gellhorn and Joslyn 1937; Barach et al 1947; 
Grether and Smith 1942; Gagne and Smith 19435; Loucks 1944; Russell 
1948; Green et al 1945; Green 1947; Barach, Brookes et al 1945; 
Eckman et al 1945; Rahn, Otis et al 1946). Exceptional data were 
obtained by Bagby (1921) using a rebreathing technique, and in one 
of four subjects by Barach, McFarland, and Seitz (1937) who interpret 
their inconclusive finding to practice effects. Studies of fatigue 
and allied conditions are in fairly good agreement that performance 
on computation tests is not markedly impaired (Robinson and Herrmann 
1922; Kleitman 1923; Lee and Kleitman 1923; Weiskotten 1925; 

Laslett 1928; Ryan and Warner 1936; Dockeray 1915; Whiting and English 
1925; Barmack 1938; Hollingworth 1939; White 1947; Muscio 1920). 

A single exception is offered by Warren and Clark (1937) who report 
an increased number of blocks in sleep-deprived subjects. Results 

on noise (Ford 1929; Harmon 1933; Obata et al 1934) are complex. 
Other findings are given in the table. 


Few reliability coefficients for computational tests are 
available. Melton (1947) places the test-retest reliability of the 
SAM Addition Test as determined at ground level at .85 to .90, 
depending on the trials correlated. At altitude, however, the 
reliability coefficient on test-retest falls to .19. He has further 
demonstrated with this test, a practice curve which reaches a stable 
level fairly rapidly. Mace (1935) has shown that performance depends, 
among other things, on standards adopted by the subjects. Since 
these differ from subject to subject and from time to time this 
factor would tend to depress the reliability of the test. Guetzkow 
et al (1946, 1947) report a test-retest coefficient of .85 for a 
twoebyeone digit multiplication task. 
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The psychological functions measured by computational tests are 
variously characterized in the literature. According to Whipple (1914), 
elements of perception, movement, attention, retention, as well as 
simple association are all involved in computetional performance, 
Melton's factorial evidence shows the highest intercorrelation between 
addition and single dimension pursuit, and lowest with code substitution, 
within the test battery employed. Of a numerical operations test, 
Guilford (1947, p. 83) reports that "little or no significant variance 
appears in any factor other than the one so characteristic of this test, 
and of other mathematical and numerical tests - the numerical factor". 
Correlations between computational tests and a simple motor function, 
such as finger dexterity, are low; their correlation with tests of 
general intelligence is, however, fairly high. 


17. Tests of Perceptual Judgment 


In tests of perceptual judgment the task is to compare two 
or more stimulus patterns of some degree of complexity. Although the 
bulk of such tests are based on vision, a number have been developed 
which depend on other sensory fields, as the tactile or kinaesthetic. 
The stimulus dimensions to be judged vary from test to test and include 
intensity, extent, duration, form and others. Some of the more widely 
used tests falling under the present category are those of weight 
judgment, size-weight discrimination, estimation of known size, line 
bisecting and trisecting, extension of curves, judgment of distance, 
relative speed, lapsed time, spatial correspondence, and the various 
form boards. For a discussion of the problems involved in the construc- 
tion and administration of such tests, the reader is referred to the 
works of Gibson (1947), McFarland (1946), and Fitts (1947). 


With few exceptions, perceptual judgment tests have indicated no 
decrement in performance under the conditions listed in Table 17. 
Aside from the work of McFarlend (1937-III, 1932) in which time percep- 
tion and form board manipulation are show to be influenced at relatively 
extreme altitudes, and thet of Guetzkow and Brozek (1947), in which a 
emall effect on a spatial relations test under acute Vitamin B deficiency 
has been demonstrated, results have been negative or inconclusive. 


Few estimates of the reliabilities of tests of the present sort 
have been found. Grether (1942) reports that reliability obtained with 
a line-bisection test was unsatisfactory, possibly due to the subject's 
lack of knowledge of the results of their performance. Guetzkow et al 
(1947) find a reliability of .90 (trials 20-21) for a ‘flags test* 
(Thurstone) which depends on ‘mental rotation’ of the stimulus patterns 
in order to determine their similarity or dissimilarity. Practice 
effects with the latter test, while present, are reported to diminish 
rapidly. The Minnesota Paper Form-board, which is considered by 
Garrett and Schneck (1933) to measure form and space perception, is 
stated to have a reliability coefficient of .90. 
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According to Guilford (1947, p. 6) the space factor, 
which some such tests may be presumed to deal with, is analyzable 
into “awareness of spatial relations or arrangements; a spatial 
orientation in which reference to the human body is important," 
and a second, ill defined factor, "a dynamic function, since it 
is present in most tasks involving movements of machinery, 
transformations of objects and changes in position." 


18. Miscellaneous Tests of Visual Perception 


Tests included within this category form a composite 
group considered together for little more than purposes of 
convenient description. 


Paper-maze tests, of several kinds, all requiring relstively 
fine discrimination, have been used by several investigatsr:. 
Pollock and Bartlett (1932) describe an *eye-maze' which musi be 
followed with the eye alone, without the use of a pencil. Mazes 
are presented on a sheet of paper on which are printed three 
rectangles, each one containing a set of ten mazes. The subject 
marks with a pencil the exit of each of thirty mazes presented 
consecutively. Unit-scores are taken in terms of time to complete 
each 10 mazes; number of errors is also recorded. A slight 
impairment has been reported by these investigators on this test, 
with loud noise. 


Grether (1942) describes a *letter-maze' in which the subject 
is presented with a sheet of typewritten lines composed of A's 
and C’s. The task is to draw a continuous pathway of A's from 
the left hand top of the sheet to the bottom of the sheet. Twenty 
*barriers’ occur on a test sheet in each of which there is only 
one correct crossing-point out of approximately 20 possible choices. 
Score is the number of barriers crossed in three minutes. Ten 
equivalent forms of the test differing only in the position of 
crossing points are available. Learning proved to be virtually a 
negligible factor in the test and the range of scores adequate. 
Test-retest reliability computed for trials 2 and 3 is given as 
-80. However, impairment with altitude, as measured by this test, 
proved negligible (Grether and Smith 1942). 


Tests of reversible perspective have been employed by sevsral 
workers (Smith 1916; Hollingworth 1939; Ehrenstein!, Fitts 1947) 
for detection of deficit under fatigue and enoxia, the combined 
results of which are wninterpretable or negative. Procedures 
require the subject to indicate, usually by pressing a key, 
whenever an ambiguous stimulus such as a staircase, or a nest 
of cubes appears to change its orientation. Although practice 
effects are reported to be marked, reliability of these tests 
may be inferred from a study by Fitts (1947) to be reasonably high. 


soited by Fitts (1947). 
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Tests of number identities involve presenting the subject with 
parallel pairs of figures, some of which are identical, others 
non-identical, The task is to check the numbers, indicating whether 
they are the same or different. Score is usually given in terms of 
number of items correctly checked minus the number of errors during 
an interval time. The intertest reliability of a form of this test 
is given by Guetzkow and Brozek (1947) as .87 (2lst and 22nd trials). 
Intercorrelations of this test with other tests in their battery, 
of the type included in general intelligence tests, ranged from .1l 
to .44. Use of these tests with diet, and a combination of cold and 
diet, has yielded negative results (Glickman et al 1946; Guetzkow and 
Brozek 1946). However, Consolazio et al (1947) report a decrement 
bias: COp excess and Op decrease in sealed chambers. 


Two tests of visual illusions were employed by Grether, Cowles 
and Jones (Fitts 1947) under anoxic conditions with a negative outcome. 
A group of tests, still in the developmental stage, which deal with 
illusions perceived under conditions of aircraft flight, such as 
angular acceleration and 'g’, are under investigation by Graybiel and 
his collaborators (1945, 1946a, 1946b, 1948). 


The Clock Test developed by Mackworth (1944, 1948a) presumably 
samples the psychological functions involved in prolonged visual search 
required of radar operators. A six-inch pointer moves 360° clockwise 
over a clear, white sheet in 100 discrete movements. At irregular 
intervals the pointer skips a full step. The subject is required to 
push a key whenever this double length movement occurs. A test session 
of two hours is divided into 20 min. intervals during which 12 double 
length movements occur, followed by 10 minutes of regular movement. 

The series is repeated four times. Information concerning results is 
withheld from the subject during the test period. A decrement in 
performance was observed after the first half-hour of continuous per- 
formance, which was increased under the condition of divided attention, 
when the subject was required to listen for a telephone message while 
watching the clock. (See Section II of this report.) Error scores were 
also shown to be increased by both high temperatures (Mackworth 1948bd) 
and ‘chilling’ (Ellis 1947) and decreased with benzedrine (Mackworth 
and Winson 1947). Carpenter (1948) has shown that errors on the clock 
test correlate with rate of blinking under the fatiguing condition 
produced by two hours work on the test. On the basis of results 
obtained with 6 repetitions of the test by 10 subjects, Carpenter (1946) 
has concluded that practice effects, if present, are obscured by other 
variations. 


Other miscellaneous tests of visual perception are included in 
the following table with the results obtained under various conditions. 
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19. Tests of Visual Perception Span 


Tests of visual perception span measure the number of 
elements within a complex stimulus that can be immediately reported by 
a subject following its momentary exposure. Materials employed are 
varied, consisting of letters, short words, dots, digits, and the like. 
The short exposure interval necessary to eliminate eye movements, is 
usually produced tachistoscopically. The subject reports orally, or 
sometimes by writing, as quickly as possible, those characteristics of 
the stimuli specified in the instructions. Strict comparability among 
these tests would demand control of a number of factors, often upheeded 
in the literature, such as, preparation of the subject, distance of 
materials from the eyes, length of exposure, fixation of the eyes, and 
intensity of background illumination. Scores are usually taken in 
terms of the lergest number of elements that can be grasped by the 
subject, without error, in a single exposure. For a discussion of 
methods employed in tests of this type, the reader is referred to 
Garrett and Schneck (1933) and Chapman and Brown (1935). 


In several studies conducted at high altitude, McFarland found 
a reduction in sisual span for words; at lower altitudes the effect 
is less apparent (1937-I, 1937-III, 1938). Seitz and Barmack (1940), 
however, have failed to confirm McFarland's findings. Noise 
{Stevens 1941) and vibration (Coermann 1939) have yielded no signifi- 
cant effect on perceptual span. 


No data on the reliability of tests of visual perception span 
were found. According to the findings of Barmack and Seitz (1940) 
performance on a test of perceptual span showed a practice trend of 
considerable magnitude. 


20. Tests of Fixation (Immediate Memory) 


"Immediate memory’ refers to reproduction of stimulus 
materials, on the basis of a single exposure, when a brief time- 
interval is interposed between the occasions of presentation and repro- 
duction. Materials consist of forms, dots, letters, words, and the 
like, which are visually presented, or, in testing auditory memory 
span, taps or spoken words may be used. The subject's response may 
be oral or written. 


In the standard test of visual *memory span’ the stimulus- 
patterns are presented to the subject at a fixed rate, each one 
being exposed for as many seconds as there are elements in the stimulus. 
Thus, if the presentation contains eight digits it is allowed to 
continue for eight seconds. Following presentation the subjec* 
reinstates as much ag possible of what he has seen without support 
of external cues. Number of elements usually ranges from three to 
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nine, with fifteen or more trials in the test. Scoring is often 
accomplished on a point scale, the subject receiving credits, 
weighted in proportion to degree of difficulty, for each item 

in a series correctly reproduced. In the auditory form of the 
test, the subject is commonly required to reproduce combinations 
of tappings sounded by the experimenter on a series of wooden 
blocks. The latter test is thereby complicated over the visual 
since it calls for memory of position as well as number of 
elements. In either form, the test may be made considerably more 
difficult by requiring backward reproduction of the stimulus 
sequence. In a complex variant of the present type of test, 
Bagby (1921) used 49 miniature lemps arranged in rows of 7 and 
mounted on a vertical panel. A scattered group of 3 to 7 lamps 
were lighted for 3 seconds. The subject, following extinction 

of the lights, designated,on a map corresponding to the possibilities 
of the presentation, which ones had been lighted on a particular 
trial. 


In a ‘location-memory'’ test (Dorcus and Weigand 1929) the 
subject is shown slides containing 6 different patterns of circular 
spots varying in number from 3 to 8. The subjects were required 
to duplicate the presented pattern on a prepared sheet by filling 
in those circles which had been shown on the slide. The score 
is the number of circles correctly filled in minus the number 
incorrectly marked. 


An allied test has been used by Nixon (1946) in a series of 
experiments dealing with immediate memory for spatial relations. 
In brief, the method involves showing subjects one or more spots 
in various positions and at various distances apart on a white 
circle, and later requiring them to mark the positions on a blank 
replica of the original circle. Scores are derived airectly from 
errors of estimation. 


A further version of an immediate memory test by Finan and 
Hammond (1942) is believed to admit of increased quantification 
and control. The subject viewed a small aperture illuminated by 
a light of given intensity for 2 seconds duration, Bither immediately 
upon extinction of the light, or 15 seconds later, the subject 
turned a knob which controlled the illumination of a second 
aperture, in an effort to reproduce the original intensity as 
Closely as possible. A series consisted of 18 trials with four 
different illuminations presented in random order. A control on 
subject differences in discriminative capacity was obtained by 
running a concurrent series of *matching’ trials in which the 
subject was required to duplicate each of the four standard light 
intensities (while these were actually present) by controlling 
the illumination in an aperture immediately adjacent to that in 
which the original light appeared. A total score was obtained by 
subtracting the average ‘matching’ score from the average "memory’ 
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score, Although the test-retest reliability of this test proved 
low (.27), preliminary data suggest its sensitivity under extreme 
conditions of altitude. 


Results summarized in Table 20 indicate that the tests of 
immediate memory employed revealed little or no deficit under the 
varied conditions of moderate anoxia, smoking, '*fatigue’, carbon 
monoxide and Vitamin B deficiency. 


According to Garrett and Schneck (1933) the reliability of 
tests of memory span has been reported to be high: .84 for auditory 
digit span and .74 for visual digit span. Nixon (1946) reports 
that the test-retest results on his version indicate reliable 
readings from day to day on the same subjects. Practice appears 
to improve performance to at least some extent. Range of scores 
yielded by standard tests of memory span has proved to be narrow, 
suggesting the restricted usefulness of this particular form of 
immediate memory test. 


High intercorrelations have been reported between visual and 
auditory memory span. Garrett and Schneck (1933) report an inter- 
correlation of .73 between digit span and number cancellation. A 
number of studies support a high degree of relationship between 
tests of memory span and those of general intelligence. In general, 
immediate memory has been found to correlate only to a small extent 
with tests of rote memory. In a series of highly suggestive experi- 
ments, Nixon (1946) studies immediate memory as a function of a 
number of variables including delay interval and the number and 
position of elements. A negative exponential function is suggested 
to describe results of ‘immediate forgetting’. 


21. Tests of Memory and Learning 


Tests of memory are distinguishable from those in which 
*fixation’ receives primary emphasis, in terms of the length of time 
interval interposed between the ‘acquisition’ and "*reproduction' 
occasions. In the tests of ‘immediate memory’ considered within 
the preceding section, reinstatement followed immediately upon the 
single presentation of the stimulus materials . In the case of 
‘memory’ as contrasted with ‘immediate memory’, a longer period 
is interposed between the presentation and testing events. There 
appears to be sound evidence for distinguishing between two types 
of memory, one of which allows the subject, in his effort to reproduce 
an original situation, to utilize a stimulus-cue which is provided 
him by the experimenter, and another in which he is required to 
reinstate the original situation solely on the basis of self-given 
cues, presumably symbolic. In terms of the operations involved in 
testing these two kinds of memory, one presents the subject with a 
stimulus-cue that has been associated on one or more trials with 
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another, and is hence known as the method of ‘paired associates’. 
In the second type, whatever material has been acquired during 
previous training must be reinstated without such associated cues. 
Following common usage we may designate the first type of perfor- 
mance as ‘associative memory’, and the second as ‘reproductive 
memory'. Learning is a generic concept which includes both 
fixation and retention and which emphasizes comparison of amounts 
of material that are reproduced by the subject et various points 

in an extended practice sequence. In a strict sense, any test 
whatever may be examined from the standpoint of improvement with 
practice. Learning is hence a category which is not coordinete with 
a classification of tests according to 'performance’ as technically 
defined to mean behavior at a given moment. Tests falling under 
the present heading of memory and learning are ordered according 

to the distinctions drawn immediately above. An account of the 
variables to be considered in tests of the present type may be 
found in McGeoch (1945). 


A paradigm of the paired associate method is given by a study 
of Hull (1924), in which a series of unfemiliar geometric firures 
are presented to the subject vaired with a list of nonsense syllables. 
The subject is required to respond to each figure by speaking the 
appropriate nonsense syllable into a voice key. Pairs of items are 
presented singly until the list is completed. Serial cues are broken 
up by presenting the cards in aifferent order from triel to trial. 
Scores are derived from the. number of correct reproductions after a 
fixed number of practice trials. 


From Table 21 it may be seen that altitude is consistently 
reported to yield a decrement in associative memory (Bagby 19£1; 
McFarland 1937-I and 1937-III, 1938; Malmo and Finan 1944). It 
appears, however, that the decrement occurs only under relatively 
extreme conditions of anoxia. With the exception of alcohol 
(Hollingworth 1923-24) other conditions described in the teble 
yielded no marked change in paired associate performance. 


= 


Intratest reliability of the paired associates technique 
employed in their study was given by Malmo and Finan as .85 
(corrected). 


Another test employed by Hull (1925) provides an example of 
the reproductive memory type. Sixteen lists of nonsense syllables 
of sixteen items each are repeatedly presented to the subject up to 
the point of mastery, the number of trials required constituting 
the score. The subject reproduces as many of the syllables as 
possible without benefit of associated stimuli vresented by the 
experimenter, although self-produced serial cues may be assumed 
to play a substitute role. 
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Aside from decrements reported by McFarland at high altitude 
and by Mead (1939) and Cattell (1930) with alcohol, results obtained 
with reproduction memory tests have revealed little alteration in 
performence under the conditions arrayed in Table 21. 


Materials and techniques employed by these tests are probably 
too diverse to permit any general statement of reliability. 


In so far as tests of associative and reproductive memory have 
both lumped together the factors of fixation and reproduction, they 
may be regarded as measures of learning. In the tests employed by, 
Edwards (194la) and by Keys and his collaborators (1945) primary 
emphasis is on changes resulting from repeated practice rather than 
on reproduction per se. Consequently these tests are to be classified 
as learning rather than as memory. No clear cut difi'erences in rate 
of learning were revealed by these tests under conditions of sleep 
privation and Vitamin B deficiency. 


22. Tests of Associative Relations and Reasoning 


Tests falling in the present group emphasize the require- 
ments of facility and speed of making controlled associated responses. 
Depending on the instructions given, the subject responds to a verbal 
stimulus with a word opposite, analogous, or otherwise related to the 
stimulus word. The verbal response given, in some cases, together 
with latency, is recorded, In tests of reasoning, both complexity 
and degree of control of associations are presumably increased. 


A first group of tests, dealing with production of words from a 
fixed number of letters presented to the subject yields mainly negative 
finaings as shown in Table 22. In a common form of tests of the present 
type, Guetzkow and Brozek (1946, 1947) require the subject to form as 
many words as possible beginning with a given letter prescribed by the 
tester, in a unit time. Test-retest reliability of this test is 
reported as .77 (2lst and 22nd trials). Although practice effects are 
prominent, performance is reported to become stable enough for testing, 
by the 2lst trial. No decrement on this test was observed under 
conditions of Vitamin B deficiency. A more complex variant of the 
test, developed by Reynolds and Shaffer (1943) involves presenting 
a sheet with two columns of words to the subject. In the first column 
are 52 eight-letter words, in the second, the same words in different 
order, with the letters scrambled and with one letter omitted. Scores 
are taken in terms of the number of words formed in a unit time. A 
decrement on this test is reported with administration of sulfathiazole. 


From Table 22 it is also seen that isolated investications 
employirg tests of logical relations and reasoning, report deficit 
under conditions of "fatigue’ (Smith 1916) and alcohol (Hollingworth 
1923-24). With caffeine (Hollingworth 1912; Flory and Gilbert 1943) 
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and benzedrine (Flory and Gilbert 1943) slight increments in perfor- 
mance are reported. Other studies dealing with sleep privation 

(Lee and Kleitman 1923), tobacco (Carver 1922), and aspirin (Davis 
1936), have yielded negative or inconclusive results. 


Reliability coefficients of .80 — .95 are given by Garrett and 
Schneck (1933) for an opposites test, and of .88 for a test of 
difficult analogies. Standardization data for various types of 
logical relations are given by Woodworth and Wells (1910-11). 


Spearman holds that tests of controlled association are among 
the most heavily loaded with a 'g' factor. A number of studies 
cited by Garrett and Schneck bear out a high degree of relationship 
between tests of opposites and analogies and those of general 
intelligence. 


23. Tests of Perseveration (Change of Set) 


The rationale of tests under this category is apparently 
the role of ‘stable set’ or interference of one type of performance 
with a subsequent one in many types of inefficient or maladaptive 
performance, Tests of perseveration are characterized by changing 
the task-instruction to the opposite of what it is habitually, or: 
what it has been immediately preceding. Variant forms of this 
test are based on the interposition of opposed tasks alternately, 
within the same test. Scores are taken in terms of speed or accuracy, 
on the two parts of. the test, or occasionally in terms of a qualita- 
tive analysis of errors. In a perseveration test employed by 
McFarland (1937-I) the subject was required to add and subtract 
alternately a series of digits. The subjects were then instructed 
to perform the mathematical opposite to what the plus and minus signs 
indicated, A further example of a complex ‘directions*® test which 
appears to have strong perseverative Components has been used by 
Loucks (cf. Melton 1947). The subject is required to enter a letter 
beside a two-place number in accordance with seven different instruc- 
tions such as *{f the number is odd, write the letter C®, or "If 
the number is odd and divisible by three, write the letter D", or 
"If the number is odd or even and is divisible by five, write the 
letter B", etc. Included within the present group are two tests of 
mirror-tracing (Louttit 1943; Peters 1946), so classified, because 
although a path-tracing component is clearly present in these tests, 
the emphasis as determined by the indices of performance chosen for 
measurement is on the inability to reverse a well established eye- 
hand coordination. 


Results obtained with perseveration tests are summarized in 


Table 23, McFarland (1937-I, 1937-III) has demonstrated deficit 
in performance of the present sort at altitudes above 15,000 feet. 
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Kleemeier and Kleemeier (1947), employing a variety of perseveration 
tests have shown a consistent improvement in performance with benze- 
drine. Other findings with conditions of dietary deficiency (Guetzkow 
and Brozek 1946), altitude (15 min. at 18,000 feet) (Loucks 1944) and 
repetitive work (Wyatt and Langdon 1937) showed no impairment. 


No reliability estimates for this type of test were found, 
possibly because of the ambiguous meaning of the usual reliability 
statistics as applied to tests of this sort. According to Guilford 
(1947, p. 564), "The hypothesis that change of set is a fundamental 
trait that can be measured by a battery of tests was not proved,%o 
be justified by results achieved". 


24. Miscellaneous Performance Tests 


Several tests which do not fall in any of the categories 
outlined in the preceding summarizations are considered under the 
heading of ‘miscellaneous’ tests. 


A first type of test is based on deterioration in quality or 
quantity of handwriting. Interest in this effect is, however, less 
on the aspect of motor efficiency then on that of the ‘central’ 
integrative mechanisms assumed to be involved. In the experiment 
of McKenzie, Riesen et al (1945) the cessation of handwriting under 
extreme anoxic conditions is regarded as a limit of performance. 
Writing under such conditions degenerates into a scribble and stops 
at a point which is considered to be the ‘end-point’ of consciousness. 
The valicity and sensitivity of the test is seen in the finding that 
for each 1000 feet increase in altitude between 25,000 and 32,000 
feet, the duration of handwriting is decreased by approximately 20 
seconds. A number of cther investigators have shown Landwriting 
deterioration as a function of altitude (see Van Liere 1942). The 
table includes only those references where an attempt was made to 
quantify the observations (McFarland 1938; Hemingway 1944). 


In a complex 'coding' test described by Mackworth (1948a), the 
subject is reauired to place a large and a small block on each of a 
number of pegs, in accordance with coded instructions. Decrement in 
performance on this test under high Effective Temperature has been 
demonstrated in the same study. 


Cattell (1941) has devised a ‘Cursive Miniature Situation Test' 
which makes use of a moving strip of paper on which are printed a 
succession of lines and other geometric patterns. The subject 
performs certain prescribed tasks on the fi-ures during their 
exposure through a small aperture. According to its desi:mer, this 
test, by virtue of standardized, prearranged difficulties, frustra- 
tions and demands on judgment, measures such qualities as quickness 
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of decision, resourcefulness, excitability, patience, restraint, 
enterprise, etc. The split-half reliability of the test is reported 
to be high, .90 (corrected). A significant difference between the 
performance of normal and psychotic subjects has been demonstrated 
with the test. Performance is stated not to be related to age, 
intelligence or education. McFarland and Franzen (1943) report, 

on the basis of use of the test on aviators, that it has potential 
as a selection test for aircraft personnel, but point to the burden- 
some scoring as a disadvantage. Keys et al (1945) have used the 
present test to indicate possible effects of dietary privation with 
inconclusive results. 


A 'stress test' developed by Freeman (1945) requires the subject 
to perform two disparate acts under conditions of distraction. On 
the right side of a panel is presented a series of simple discrimination 
. problems which are responded to at the subject's own rate of response. 
On the left side of the same panel are shown a series of numerical, 
form, and letter equations which would not be difficult to follow if 
they were presented alone. The subject signals whether the equation 
is right or wrong by pushing the appropriate one of two levers with 
the feet. The number of correct discriminations and problems is 
integrated automatically by means of counters. The ‘stress’ component 
is provided by auditory distractions to which the subject is forced 
to listen since comments relevant to performence are included among 
irrelevant ones, A third task added (reproduction of rhythm code 
patterns) resulted in the refusal of a 'majority' of subjects to 
continue with the task. There appears to be some evidence that 
neurotic subjects show greater disturbance than normals both auring 
and following the test. 


A further test devised by Farmer and Chambers (1926, 1929, 1933) 
is based on the movement of a set of levers which control the appearence 
and disappearance of certain numbers on e dial. The subject strives 
to produce prescribed combinations of figures involving the manipule- 
tion of not fewer than three levers. Scores are taken from the time 
required to achieve the correct result. Pollock and Bartlett (1922) 
have utilized this test in a study of the effects of noise with the 
finding that an initial decrement gave way to unimpaired performance. 


The relative neglect of motivational factors in performance has 
evoked criticism from a number of investigators who heve attempted 
to construct tests of various kinds that misht overcome this lack. 
The 'Guidit Test’, developed by Pollock (1929) presumably provides 
the subject with more task-incentive than most performance tests. 
A small metal ball must be guided up an inclined plane with a fine 
knitting needle set in e handle. Between the bottom of the board 
and the goal at the top are 21 holes large enough for the ball to 
drop through, and 7 barriers to be circumvented. A system of marks 
is employed to credit moving the ball a given distance; time consumed 
is also recorded. No decrement in this performance was found by 
Pollock and Bartlett, under noise. McFarlend (1932) using the test, 
reports effects on performance of individuals under anoxic conditions. 


A pin-ball machine reported by Melton (1947) yielded negative 
results at altitude. 
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25. Complex Tests Simulating Aspects of Flight Performance 


References listed in the accompanying table provide represen- 
tative samples of tests which simulate one or more aspects of flight 
performance. Within this group are placed ‘trainer tests’ which 
involve bodily displacement in a miniature fuselage, requiring 
coordinated movements of a set of controls by the subject who in order 
to fly a ‘course’ must maintain the balance of the apparatus in 
response to visual, kinaesthetic and other stimuli. Detailed infor- 
mation on apparatus, procedures and reliabilities of several tests 
of the present type, including the Link Trainer, are given by Melton 
(1947). D. R. Davis (1942, 1948), using a Silloth Trainer, has shown 
no progressive deterioration of performance during a five-hour test 
period. However, an increase in the number of good and bad ‘patches’ 
of performance was observed. 


Aliso falling in the category under discussion is the 'Cambridge 
Cockpit’ described by Craik (1940), and used by Drew (1942, Bartlett 
(1943, 1947), and Davis (1946a, 1946b, 1947, 1948). The subject is 
given verbal instructions along with a written summary of task require- 
ments and is then put through a series of exercises involving maneuvers 
as well as straizht and level flying. The total amount of movenent 
a” each of the controls is summated and graphic records of movements 
. the aileron and elevator are taken. The apparatus resembles a Link 
Trainer but differs with respect to controls and immobile fuselage. 
Under the condition of noise no change in accuracy of control was 
observed, but there was evidence of greater variability in the magnitude 
of displacement of the controls. For a further consideration of results 
obtained with this apparatus see Section II of this report. Validity 
of this test has been established in some measure by demonstrating 4 
correlation between scores and accident proneness within a small group 
of pilots. 


As a matrix of test ideas the 'Cambridge Cockpit’ has proved its 
value and has generated, among others, the Skilled Response Test (Davis 
1946a, 1948) which appears to involve some features of discrimination 
reaction time combined with rate pursuit as well as conflict. A 
stimulus appears either to the right or left of a display consisting of 
three vertical lines. A pointer manipulated by an aircraft-type of 
control must be displaced from the neutral position either to the risht 
or left by moving the control correspondingly, setting up a rate of 
movement of the needle proportional to the magnitude of displacement 
of the control. The subject's task is to bring the pointer into alingn- 
ment with the correct lateral line by accurately timing his return of 
the control to the neutral position. The response requires accurate 
timing if overshooting and the necessity for secondary compensatory 
movements are to be avoided. The magnitude of displacement of the 
control and accuracy of alignment are measured. Data are available to 
indicate that complicating the presentation by lighting the lateral 
lines simultaneously, especially to the same intensity (discrimination 
conflict), results in greater displacement of the control, together with 


USAF-TR-5830 126 


exaggerated overshooting and increase in the frequency of secondary 
responses. When noise was added as a condition, no change in the mean 
accuracy of performance was observed, but variability of displacement 
was increased. These studies have possible implications for analysis 
of performance in general in their derivation from a complex but 
presumably well controlled situation, rather than from a gross job 
analysis, as well as in their emphasis on variability and qualitative 
aspects of performance rather than merely on accuracy or speed of 
performance, 


The Stevens Coordinating Serial Reaction Test (Stevens 1941) is 
superficially analyzable into a complex discrimination reaction test 
in combination with a two-dimensional non-compensatory pursuit test. 
The subject manipulates stick and rudder type aeroplane controls which 
govern the movement of a rectangular spot of light on a screen. The 
spot of light is moved along a number of curvilinear pathways presented 
on the screen, in order to extinguish lights placed at the ends of the 
pathways. The target lights placed at the corners and center of the 
screen flash on in irregular order, one at a time, following comple- 
tion of a trial-response. Score is given in terms of time required to 
extinguish 50 lights; time spent off the pathway is also recorded by 
means of a counter. Stevens, using the present test, reports a 5% 
decrement in performance under the condition of loud noise. Pincus 
and Hoagland (1943) have shown that pregnenalone improves performance 
over the level normally attained after two hours of work on the 
apparatus. According to these investigators the Coordinating Serial 
Reaction Test has a protracted practice curve and fails to eliminate 
the possibility of rhythmic performance which might be assumed to 
allow the subject to compensate for deficit. 


A final test discussed under the present heading is the Dial- 
Matching Test (Hoffman and Mead 1943) in which the sudject is reauired 
to (1) align the pointer of an inner dial with thet of another, outer 
dial, whose movements are determined by the movements of an irregular 
cam; (2) keep track of deflections of 12 ammeters by throwing a togzgle- 
switch when a deflection is noted; (3) signal at the end of every ten 
minutes of interval passage; (4) indicate by a sienal when a miniature 
aeroplane arrives at various stopping points represented on a map; 

(5) indicate by a signal when a minieture Zeppelin appears in any of 
four quadrants of the map. Failure to make any response corrsctly 
causes the aeroplane to stop in its flight until the appropriate 
adjustment is made, resulting in longer time required for the plane 
to arrive at a terminal point. Score is obtained automatically in 
terms of the time and errors made by the subject in his effort to 
prevent the plane from being delayed. No significant changes were 
observed during four hours of performance on the test (Clark et al 
1943). The nature of the present task is highly complex, involving 
elements of pursuit, discrimination and attention, as well as integra- 
tion of behavior-segments under a goal. 
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In the judgment of the writers, the chief justification of 
the ‘miniature situation’ test is in its assumed relationship to a 
criterion situation to be predicted. Therefore, this type of test, 
while not necessarily unrelated to performance, may not be specifical- 
ly designed to measure decrement. <A further objection is that such 
tests scarcely push the problem any farther toward the ultimate goal 
of analysis of the psychological components which underly the complex 
performance product. On the other hand, the British use of the 
miniature situation as a semi-controlled source of leads to be 
exploited indevendently under isolated conditions, appears te yield 
fruitfu.. results. 
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SUMMARY OF RESULTS AND GENERAL CONCLUSIONS OF SECTION I 


Compilation of test methods and findings given in the preceding 
parts of the report suggest several conclusions of possible value 
to future research efforts concerned with the testing of performance 
decrement. 


(1) From Table 26, which recapitulates test data by the 
condition of altitude, it appears that the availability of tests 
showing gross performance decrement under this condition does not 
pose a serious problem. In fact, despite wide variations in anoxic 
environments, and tests employed to measure their effects, virtually 
all types of tests yield a decrement in performance, providing the 
anoxic condition is sufficiently extreme. Evidence presented supports 
the view that the effects of anoxia are reflected ubiquitously through- 
out the range of behavioral functions measured by performance tests. 

A similar statement, based on considerably less evidence, however, can 
be made of the conditions of heat and cold, as is shown in Table 29. 
By contrast, noise and/or vibration appears from Table 28 to have 
pronounced effects on very few behavioral functions as measured, 
Fatigue, as will be seen in Section II of the present report, has 

no clear cut effects on many dimensions of behavior. Effects of 
conditions superimposed on altitude, as summarized in Table 27, 
likewise appear to require especially sensitive measurement in order 
to be made manifest (assuming that the conditions have an effect). 


Again, in spite of dissimilarities in tests classed under the 
same Categories, consistencies in the data suggest that certain types 
of tests are more affected by the condition of altitude than are others. 
The obvious possibility of a simple relationship between sensitivity 
and degree of complexity of test is not supported by the findings, 
although it appears that neither an extremely simple type of test, such 
as strength of grip, nor an extremely complex one, such as general intel- 
ligence, reveals marked decrement. Steadiness and steadiness-aiming 
indicate impairment in efficiency at altitudes as low as any at which 
effects have been demonstrated. It may be significant that the pursuit 
tests, which may also be presumed to have a strong ‘precision’ component, 
are influenced at lower altitudes than most other types of test. Of 
possible interest, too, is the fact that, among the more complex tests 
which employ errors as an index of scoring, such as code-substitution 
and computation, most show impairment at moderate altitudes. In the 
cases of color naming (Bills 1937; McFarland 1937 and series), code 
substitution (Malmo and Finan 1944), and computation (Warren and Clark 
1937; Barach, Brookes et al 1943), where direct comparison between time 
and error scores is possible, the latter prove more readily influenced. 
Such comparisons, however, are only partially justified, since errors 
contribute mediately to time scores in many types of tests. Within 
the most complex tests, reproductive memory is, according to the frag- 
mentary evidence summarized in the table, less sensitive to anoxia than 
associative memory, possibly as a result of increased availability of 
serial cues to the subject, in the former. Perceptual tests, including 
perceptual span, and tests of ‘immediate memory’ do not place among the 
most sensitive. Results summarizing effects of various conditions 
superimposed on altitude (see Table 27) are difficult to interpret as 
in, or out of line with the findings on altitude alone. 
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(2) A survey of the field emphasizes throughout the need for 
standardization of tests employed to detect deficit. Few instances 
can be found in the literature in which a given test, even when it 
has been demonstrated to possess reasonably adequate reliability and 
sensitivity characteristics for most purposes, has been exactly 
duplicated by subsequent investigators. In so far as alterations 
in testing procedures represent progressive improvement, they are to 
some extent justified. All too often, however, it seems apparent 
that test variations result from a lack of familiarity with the 
extant body of knowledge on decrement testing, or worse yet, from 
lack of appreciation of the necessity to control as many relevant 
factors as possible. Workers in fields allied to psychology, who 
are understandably unskilled in the use of tests of the present type, 
are particularly guilty of the latter offense. Psychologists might 
well settle on a battery of performance tests designed to sample 
the major dimensions of performance, in so far as they can be 
identified at present, by the most efficient techniques currently 
available. The resulting gain in comparability of data might well 
compensate for the sacrifice of technical progress entailed. 


(3) Results presented in the study contraindicate the widely 
held view that a single test can be considered an ‘index’ of 
"psychomotor performance’ in general. Different tests are observed 
to behave differentially under the same environmental condition, and 
the same test is noted to be influenced differentially under dissimilar 
conditions. Intercorrélations between tests judged similar on itntui- 
tive grounds prove, for the most part, to be relatively low. 1.3 
point is seen dramatically in the case of such a simple ‘unction as 
strength of hand grip (see Table A-7) which may not be assumed to 
measure strength in general, nor general muscular tone, nor, in fact, 
anything more than the strength of the particular members tested. 


(4) Attempts to interpret test functions make especially apparent 
the need for an adequately founded approach to the basic factors 
underlying performance. This lack of fundamental classificatory 
principles goes beyond mere logical nicety, since it stands in the 
way of any attempt to cope with the practical problem of designating 
a battery of tests which can be relied on to sample the major dimen- 
sions of performance. While distinctions can be made in terms of 
testing operations, little basis is provided for weighting those 
which are significant and others which can be regarded .s negligible. 
Some guidance is given by frarpments of factorial end intercorrelational 
evidence but the grouping of ‘actors is still far too narrow to 
provide a useful conceptual 1.amework. Little is accomplished by 
grouping the tests under such conceptual categories as *discrimination’, 
Tassociation', *logical -elations', ‘change of set’, and the like, 
both because of the high degree of overlapping of these functions in 
many tests and because of their unempirical basis. Needed studies 
aimed directly at the problem of isolating component functions could 
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be accomplished by successive experiments within the same testing 
situation, in which factors of presumed significance are systematically 
varied. <A promising start along the lines suggested has been made by 
Melton's coworkers (1947) in studies comparing the sensitivity of four 
pursuitmeters of different degrees of complexity under constant condi- 
tions. In another approach designed to obtain basic information 
related to performance tests, Cockett (1947) has systematically 
investigated the relationship between complexity and reliability of 
serial reactions of certain types. On the basis of this study, the 
interesting hypothesis is advanced that reliability increases as a 
function of degree of integration of the response required by the 

task. It is believed that a fruitful research program could be under 
taken by utilizing, for example, a reaction time situation which could 
be adapted to permit variations in complexity of reaction, in type of 
instruction given, in types of stimuli to be discriminated, in the 
succession of response, and many other variables, with all other 
factors held constant. Such functions as discrimination reaction, 
conflict reaction, perceptual judgment, change of set, could be 
studied with the possibility of relating the various types of functions 
with each other directly and with a maximum of control. 


(5S) The greater sensitivity of errors as an index of scoring, 
at least under some conditions, suggests several interesting lines of 
speculation in view of recent work done by the Cambridge Applied 
Psychology Unit. In a number of studies, time measures, by themselves, 
proved to show less decrement in performance than more complex measures 
in which errors are weighted. Thus Davis (1948), using prolonged 
performance in the Cambridge Cockpit as a fatiguing condition, has 
shown decrement in terms of the ratio of total duration of errors to 
the number of movements of the controls. Two types of ‘abnormal’ 
reactions to fatigue have been demonstrated, one in which the control 
movements are high in relation to duration of errors, and another in 
which duration is high, relatively, to number of movements. Qualitative 
enalysis of errors has proved to yield highly suggestive results. 


It is significant that the Cambridge Cockpit studies which were 
designed to give information concerning kind and duration of errors 
showed deterioration in performance while the complex task of Hoffman 
and Mead (1943) did not reveal decrement during an equivalent or 
longer test period. One factor in this difference may be that in 
the latter test a failure to respond caused the machine to stop and 
thus direct knowledge and immediate correction of errors was possible; 
duration of errors was probably a negligible factor during the total 
test period and omission of tasks unlikely. In the former studies it 
was shown not only that the period before compensation for errors 
became significantly longer during the course of the test period, 
but that some tasks were omitted entirely. 
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(6) Variability of response as a further dimension of scoring 
has been noted by Davis (1948) as well as by a number of other 
investigators. (McFarland 1937, series; Ryan and Warner 1936; 
Green 1947; and others.) An additional method of analysis is 
suggested by Mackworth (1948b) who reports differential decrement 
between groups of good initial performers and those subjects who 
performed more poorly initially. 


Interpretation of deficit in terms of *blocking’, ‘conflict’ 
and ‘disintegration’ of behavior by the Cambridge investigators 
suggests a possible line of investigation to test the applicability 
of the conflict analysis of the Yale group to problems of performance 
decrement. A more detailed analysis and evaluation of these studies 
is included in Section II of the present report. 
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SECTION II 
STUDIES OF FATIGUE, LOSS OF SLEEP, APPREHENSION AND STRESS 
INTRODUCTION 


No attempt has been made to define ‘fatigue’ for the reason 
that the term has no generally accepted meaning. Detailed reviews 
of recent concepts may be found in the recent books by Bartley and 
Chute (1947), and Carmichael and Dearborn (1947). We have undertaken 
rather to steer clear of controversial definitions by employing 
such terms as decrement, or deterioration in performance. In these 
cases where the deterioration has affected the integrated personality 
use has been made of the customary phrase, operational fatisue. 
Otherwise the term 'fati<ue" has been carried in single quotes. 
We review (1) studies with tests administered after subjects have 
engaged in activity which is presumably 'fatisuing'; (2) studies of 
deterioration during the progress of the task itself; (3) studies 
employing stimuli introduced to heighten stress; (4) studies of 
deterioration in the performance of tasks simulating flight; and 
(5) studies of operational fatigue. 


Studies with Tests Administered after the Subjects have Engaged in 
Activity Presumably Faticuing. 


l. After Sleep Deprivation 


Absence of sleep has not been found to result in significant 
decrement in such tasks as (1) simple reesction time (Patrick and 
Gilbert 1896; Lee and Kleitman 1923; Coopermen, iitullin and Kleitman 
1934; Edwerds 1941; and Tyler 1947); in (2) tapping performance 
(Patrick and Gilbert 1896; Robinson and Herrmann 1922; Husband 1935; 
Katz and Landis 1955; Warren and Clark 1937; Edwards 1941; and 
Tyler 1947); in (3) stationary arm-hand steadiness (Cooperman et al 
1954; Edwards 1941; Tyler 1947); in (4) cancellation (Lee and 
Kleitmen 1923; Weiskotten 1925; except for Tyler 1947 after prolonged 
vigil); in (5) discrimination reaction (Patrick ana Gilbert 1896; 
Lee and Kleitman 19235; Cooperman et al 1934; Husband 1935), except 
that the Tufts College 1942 study obtained decrement after 50 hours, 
and Tyler (1947) obtained a decrement on a 10-minute test after 
60 hours sleep privation, although there was none on a 2-minute 
test. Other tests in which scores are not lowered by loss of sleep 
are (6) letter naming (Patrick and Gilbert 1896; Robinson and 
Herrmann 1922; Kleitman 1923); (7) arithmetical computation 
(Robinson and Herrmann 1922; Kleitman 1923; Lee ana Kleitman 1925; 
Weiskottec 1925; Laslet 1928), with the exception that Warren and 
Clark (1937) report 'blocking' after a 65-hour vigil in a test 
involving alternate addition and subtraction; nor was there 
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decrement in (8) time judgment (Tyler 1947), in (9) immediate 
memory (Tyler 1947), in (10) an opposites test (Lee and Kleitman 
1923), or (11) in a pairea associates memory test (Weiskotten 1925) , 
after three days deprivation. 


In the case of the following tests results have been inconsistent, 
some studies showing decrement, others no loss after sleep deprivation: 
(1) Body sway. Lee and Kleitman (1923), Laslett (1928), and Cooperman 
et al (1934), all found no decrement, Tyler (1947) found insignificant 
decrement and the results of Husband (1935) and Edwards (1941) were 
indeterminate. In (2) aiming, Robinson and Herrmann (1922) fotAd no 
decrement after 65 hours of sleep privation, but Edwards’ (1941) 
results were indeterminate. In (3) color naming, Lee and Kleitman 
(1923) report decrement in a long series test but not in a short 
one. Cooperman et al (1934) report no decrement after 60 hours 
deprivation, but Warren and Clark (1937) report an increase in errors 
after 65 hours. In (4) a pursuit test there are two studies: 

Husband's (1935) report shows no loss, and Laslett's (1928) results 

are indeterminate. Two studies of (5) a logical relations test 

give inconsistent results, Smith (1916) reporting loss of efficiency, 
but Lee and Kleitman (1923) finding no loss after 112 hours of sleep 
privation. In two early studies of (6) reversible perspective 

(Ash 1914 and Smith 1916), '*fatigue’ prolonged the rate of the intervals 
between fluctuations of equivocal figures, but Hollingworth (1939) found 
that from the beginning of sach run the rate of fluctuation increased 
from 25 to 50 ver cent, the curves nearly paralleling the curves report- 
ing the subject's feeling of strain. The results obtained by Edwards 
(1941) with reference to (7) typing and telegraphy, (8) the Ranchburg 
memory test, and (9) an intellisence test were inconclusive. In the 
case of those who took alternate forms of the American Council on 
Education (ACE) Intelligence Test, five subjects practically maintained 
their initial score after 96 hours of sleep vrivation and two even 
imoroved their scores, whereas the majority of the 16 subjects showed 

a decrement after 48 hours. In four other studies (Robinson and 
Richardson-Robinson 1922; Laslett 1928; Husband 1935; Katz and Landis 
1935) there was no decrement in test intelligence after various amounts 
of sleep deprivation. In the tests of (10) code substitution four 
studies give indeterminate results (Laslett 1924 and 1928; Weiskotten 
and Ferguson 1930; and Husband 1935). 


Studies of eye movements after sleep deprivation have not obtained 
consistent results. Miles (1929) and Miles and Laslett (1931) report 
a slowing of (11) saccadic movements and (12) frequency of blinking 
in subjects who were very sleepy after being deprived of sleep for 
66 hours. After keeping subjects on a similar vigil, Clark and Warren 
(1940) obtained no uniform change in (13) the number of fixations, 
in (14) regressions per line, in (15) binocular adjustments, or in 
(16) reading time. In fact, in some cases, performance scores were 
higher on the final tests. They attributed the changes that did occur 
not to the sleep deprivation of 65 hours but to temporary failure to 
overcome "the greater subjective threshold of attention and effort". 
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Tyler's (1947) study of sleep deprivation which extended to 112 
hours found no decrement in (17) critical fusion frequency. 


In the Tufts College studies (1942), (also Hoffman and Mead 
1943, and Clark et al 1943), subjects were submitted to 50 hours 
of sleep deficit, in addition to a 30emile hike, after which they 
were tested for visual discrimination, reaction time, eye-hand 
coordination, other similar sensori-motor coordination tests, stereo- 
renging and azimuth tracing. Most of these tests, although requiring 
keen observation were essentially simple and discrete in nature. The 
performances required either momentary attention and response, or only 
a short period of continuous work. Under these conditions there were 
no material decrements in performance. When, however, subjects were 
required to remain attentive for an hour and a half to a prolonged 
continuous task, there was a measurable decrease in efficiency. 
Tyler (1947) included administration of the Rorschach Test in the 
composite used with service men submitted to long period of sleep 
deprivation. No alteration in the Rorschach responses were apparent 
after 112 hours of vigil. 


2. After Hours of Driving 


One method of testing for ‘fatigue’ has been to compare the 
results on tests of short duration of subjects who have engaged in a 
specific type of work for contrasting periods of time. An example 
of this type of investigation is reported by Jones et al (1941). 
Significant decrements in performance appeared in the following 
tests, listed in order of relative loss: (1) tapping speed, (2) mani- 
pulation or manual coordination, (3) body sway, (4) simple reaction 
time, (5) manual steadiness, and (6) critical frequency of flicker 
fusion. Other tests which failed to yield consistently significant 
differences between the hours-of-driving groups and those who had 
driven recently were (7) a simulated driving test, (8) resistance to 
glare, (9) speed of eye movement, and (10) accuracy in aiming (where 
there was a non-graded decrement). There was no loss in estimation 
of the size of a dollar bill and fifty~-cent piece. In the test of 
strength of grip the majority of those who had driven ten hours 
scored higher than those who had not driven since their last period 
of sleep. 


Two other studies report the effects of tests taken after 
extended periods of driving. Ryan and Warner (1936) report decre- 
ments in (1) body sway, (2) arm-hand steadiness, (3) aiming steadi- 
ness, (4) arithmetical] computations and (5) color naming; Swope 
(1933) reported decrement in arm-hand steadiness. 
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3. After Repetitive Work 


Tests of various sorts administered after subjects have 
been engaged in activity presumably ‘fatiguing’ have given incon- 
sistent results. Dockeray (1915, 1922) had his subjects engage 
in "mental work' of various sorts, including the multiplication 
of 3-place numbers. There was subsequently some lowering of 
scores in (1) a paired associates test with nonsense syllables 
and in (2) the discrimination of sounds. On the other hand, 

Whiting and English (1925) found no decrement in (3) judging the 
lengths of lines in the afternoon as compared with the morning 

White (1947) administered (4) code substitution and (5) arith- 
metical computations after his subject had completed the exacting 
task of flying around the world in 147 hours. There was no decre- 
ment in his subject’& performance. A ‘fatigue run’ of 18 hours 
involving marching, calisthenics, and military exercises involving 

& minimum of hand work, however, did result in a decrement in 

(6) hand grip (Fisher and Birren 1946). With the test of 

(7) critical fusion frequency, Brozek and Keys (1944) found no 
change after an hour of exercise; a result which was also obtained 
by Graybiel et al (1943) with pilots after the daily flying schedule. 
Other studies, however, have reported a decrease in frequency after 
*fatigue’. Simonson and Fnzer (1941) found a decrement paralleling 
the subjective report of fatigue after the working day. Henry (1942) 
reported a decrease in the fusion frequency, and Lee (in Jones et al, 
1941) reported decrement after three hours' work with a microscope, 
as well as after hours of driving. 


Several workers have reported an increase in (8) the frequency 
of blinking after continuous use of the eyes, as in reading (Luckiesh 
and Moss 1937, 1940; Hoffman 1946, and Carpenter 1948). This finding, 
however, has not been confirmed by Bitterman and his colleagues 
(1945, 1946, 1947), nor by Carmichael and Dearborn (1947). 


Studies of Deterioration Occurring During the Progress of the 
Task Itself. 


It being obvious that there is a limit to the length of time 
one can continue to work without e period of rest, the problem is to 
note the conditions which facilitate lengthening the work period 
without ultimate loss of efficiency. (1) Fernberger (1916) found 
an increase rather than decrement after testing ability to discri- 
minate lifted weights for a period of an hour. (2) Muscio (1922b), 
with himself as subject, used an aiming test and a pursuit test; 
two spells with each for a total stretch of 10 hours 20 minutes. 

In the first three of the four divisions of the test session, 

there was e&. sradual and continuous improvement up to the end of the 
third hour; in the fourth, accuracy increased for about three- 
quarters of an hour, after which there was a steady decrement 
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amounting to about 60 per cent. Muscio's conclusion was that 
inaccuracy was a function of the rate rather than of the duration 

of the work. (3) Vernon (1926) used these aiming and pursuit tests 
and added two others: paper-folding and maze-tracing, employing 

two subjects. In the pendulum test, there was a decrease during 

the second hour of a three-hour test, followed by either a levelling 
off or an increase in the third hour. Results in the aiming test 
were indeterminate. In paper-folding there was a slight increase 

in amount done, accuracy remaining fairly level through a 3- or a 
4-hour stretch. In the maze test, similarly, accuracy and output 
held up through 4-hour stretches, even when there were two a day. 
Afternoon shifts were, however, less efficient than the morning 
periods, and considerable boredom was reported, periods of boredom 
synchronizing witk records of inaccuracy. (4) Pollock (1929) 

tested one subject with the "Guidit Test", for 4-hour stretches, 
after 60 odd preliminary sessions of shorter duration. In all 
periods of one hour or more, accuracy tended to decrease in the 
second as compared with the first half of the period. Accuracy 
remeined fairly constent, however, after the first hour even in 
spells of 8 hours, and speed increased in the second half of the 
sessions. Feelings of tiredness were accompanied by a decrease 

of about 10 per cent in accuracy, but they were not merked by 
sismificent chances in speed. (5) Bills (1925, 1937), who was 
studyins fluctuations in energy output, tested color and form 

neminzg in reriocs of 30 minutes and an hour. He found an increase 
in the len;th and the frecuency of *blocks' with increasin: dura- 
tion of the test period. (6) Laird (1923) found a decrement after 

a 4-hour period of testing with a dotting machine, the amount 
verying acoording to the nature of an accomoanyinz noise. In this 
case, it is impossible to tell what the effect would have been without 
the noise. (7) Barmack (1929) reports a decrement in a pursuit test 
after two hours of work. In 10 of the 15 subjects this was accompanied 
by a report of boredom, as indicated on a self-rating sleet. 

(8) Lindsley (1943-414) recorded the results when racar operators 
continued A-scan oscillograph operation continuously for four nours. 
There was a pro ~ressive loss in the detection of simels and in the 
accuracy of determining the azimuth or bearing of tareets revresented 
by the signals. He noted that occasional prolonged periods of 
operation may be served without appreciable loss of efficiency. This 
conclusion is not entirely unequivocal, however, since factors of 
initial aujustment in. the 4—hour period of operation and of learning 
may have masked the fatirue effects. (9) Pincus enu Hoasland (1943) 
obtained a decrement in a "targetmeter" on which subjects were tested 
for four hours, and (10) Brozek, Simonson and Keys (19-17), measuring 
acuity of vision in a test reouiring the recomition of letters, 
report consistent and prosressive decrement, with increasing varia- 
bility, in a 2-hour test period. (11) Hollingworth (1939), on the 
other hand, reports that in an 8-hour test of number Cancellation, 
practice effects masked any possible decrement; and (12) Philip 
(1939, 1940), who had subjects ensage in continuous tappins for 

6 or 7 hours at approximately maximum rate, found a decrement of 
only 6.7 per cent. 
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(13) Mackworth (1944, 1948a) discovered that the introduction 
of a ‘tension’ factor significantly affected the scores for accuracy 
in the ‘clock test*. Subjects were informed that at some time during 
the 2-hour test a "message* would arrive. Under these conditions 
there was a decrement that was larger than was usual - until the 
*message’ was received. However, the message ‘dramatically reduced’ 
the number of missed signals, raising the average number for the 
third half-hour to a standard usual with fresh subjects. He attributes 
the added increment in errors to the *tension', and the improverent 
which followed receipt of the message to ‘release of tension’. 
(14) In the test by Clark et al (1943) in Dial-Matching (see Hoffman 
and Mead 1943 for description of apparatus), subjects watched dt&ls 
continuously for 4 hours, aligning a pointer while keeping track of the 
deflection of a dozen ammeters. There were no significant changes either 
of improvement or decrement, a result which was attributed to adequate 
motivation. 


(15) In Hoffman's (1946) 4—hour test of continuous ‘easy and lizht' 
reading, sample records were taken at the end of each half-hour, but 
the subjects were not informed as to when records of their achievement 
would be made, nor informed as to their progress. Under these conditions 
the number of lines decreased significantly, a decrement showing up at 
the end of the first half-hour. The number of blinks increased after 
the first hour. (16) In a subsequent study (Carmichael and Dearborn 
1947), in which Hoffman participated, subjects were required to read 
continuously for six hours. Some of the material consisted of book 
pages, some of microfilm. Continuous records were made of the eye move- 
ments, including blinking. In no one of the measurements was there a 
Significant decrement and there was no increase in the rate of blinking, 
even though the task was increasingly unpleasant for some of the 40 
subjects. The experimenters attribute the achievement to adequate 
motivation of the subjects who were kept informed concerning their 
records and were paid 34.50 for each 6—hour session. 


Studies of Stimuli Introduced to Heighten Stress. 


Some evidence of the effect on test scores when stimuli are 
introduced which might be expected to increase the ‘tension’ of the 
subjects is furnished by the AAF Research Program (Melton 1947). 
Employing some of the standardized tests used for the classification 
of candidates, two major modifications in the stimulus complex were 
introduced. The first consisted of loud sounds and/or verbal threats 
of feilure and criticisms, supplemented by a digit memory test beyond 
the candidate's memory span; the second consisted of interruptions 
of the air supply. 


The tests on which the experiment was tried with the introduction 
of these modifications were (1) arm-hand steadiness and of aiming 
(designated respectively as Steadiness under Pressure Test, and Aiming 
Steadiness Test); (£) a peg board test (Conflicting Manipulation Test) , 
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with mouse traps inserted between the pegs; (3) the SAM Complex 
Coordination Test, and (4) the Two-Hand Coordination Test. The 
method of air-interrvption, which was used with the last two 
mentioned tests, involved fitting the subjects with the face- 
piece of a gas mask. Scores made under air-—deprivation were 
compared with scores obtained in the course of normal processing, 
the test having been introduced as an extra following the six 
standard processing tests. In the tests administered under air- 
interruption there was some decrement in the scores. On the 
other hand, the addition of verbal ‘stress’ stimuli and ‘distracting’ 
mental activity, did not greatly influence the results. In the 
Arm-Hand (Static) Steaainess Test the scores were actually better 
(though not significantly) than under normal conditions, and in 
the Aiming Steadiness Test scores also improved slightly. 


In the case of the peg-board (Conflicting Manipulation under 
Pressure Test) but two 2.5eminute trials were administered, a 
number insufficient to determine whether or not the distrections 
produced deficit in performance. In actual fact, efficiency in 
manipulating the pegs was greater under pressure than without it, 
and better scores were made in the second trial under pressure 
than in the first trial (Melton 1947). 


In connection with two of the AAF Research studies, experimental 
situations are reported in which it was presumed to be possible to 
obtain data concerning the influence of stress. In the first of 
these studies a Controls Confusion Apparatus was employed with the 
Observational Stress Test. The assumption was that the test was 
sufficiently complicated to arouse heightened tension without 
introducing any extraneous distractions. The assignment requires 
the subject to get all seven of a set of airplane controls correctly 
adjusted simultaneously. The Stick, Pedal and "T" controls each 
have ten contact positions, the "R” control, four; the other three 
controls, two each. These capital letters each correspond to those 
on the lights, except for "R", which is connected with a buzzer. 


The Observational Stress Test employed the same apparatus, but 
administration of the test differs in one respect. The subject in 
a room alone is subjected to oral criticism of his performance. No 
data are reported comparing scores made under criticism with those 
in which subjects were not being criticized, nor are the data re- 
ported by which comparison can be made between the scores on the 
succeeding trials (the first two of which were of 3-minute duration, 
the last of two minutes). 


The MultipleeControl Stress Test consisted of six tasks, each 
selected to represent part of a work sample of flying, and the test 
so arranged that whenever the candidate exceeds a given tolerated 
margin of error in any one of the tasks a distinctive and unpleasant 
sound on an earphone notifies him of his failure. The tasks are 
progressively more difficult. There were eight l-minute trials, 
which progressively increased in difficulty from the third to the 
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sixth trial. The results which were available for but 56 civilian 
pilot trainees were considered highly questionable because of "the 
distortion in the scoring system". No data are available with 
reference to any possible deterioration in test performance. 


Grip Tension. Scores representing the amount of tension 
involved in grasping ‘the stick'* were obtained for the normal four 
2-minute trials in the Complex Coordination Test. It was found 
that grip-tension scores were not significantly lower when the 
candidates were being retested. However, the pressure scotes 
decreased during the course of the test. "This is in line with 
expectations based on laboratory studies of muscular tension 
changes during learning and continuous performance" (Melton 1947, 
Ds 165)4 


Muscle-Action Potential Tests. Reference is here made to 
action potential studies as representing a variety of psychophysio- 
logical methods which were tried experimentally in the AAF toward: 
the close of the war. A method for summating action potentials was 
employed which has high reliability (.95). Records were made while 
administering the Multidimensional Pursuit Test (Melton 1947) for. 
six l-minute trials, separated by 15-second rest periods. During 
the course of the test there was a marked drop in the action poten- 
tial index between the first two and the second two trials, but no 
further drop thereafter. However, the candidates given experimental 
tests at Psychological Research Unit No. 2 “were informed that the 
score made on the experimental tests did not count in any way as 
regards their ultimate classification for aircrew duties...whereupon 
come candidates entered the test with an audible sigh of relief" 
(Melton 1947, p. 827). It seems obvious that the test might have 
shown more validity - which proved to be zero - if it has been 
administered at the beginning of the testing period. 


Studies of Deterioration in the Performance of Tasks Simulating Flight. 


Bartlett, reviewing extensive studies of ‘fatigue’ made after 
the first World War, many of which were done under his sponsorship 
at Cambridge, or with his collaboration (Wyatt and Weston 1920; 
Muscio 1922; Vernon 1926; Pollock 1929; Wyatt and Langdon 1927), 
concluded that "the skill fatigue of daily life is not set up under 
these conditions"; i.e. by requiring subjects to repeat over and over 
again easy calculations, word and color recognition or other dis- 
junctive tests. “Routine repetition of simple actions is not a 
characteristic of any highly skilled act, and least of all of work 
having a strong ‘mental’ component. The operations involved are 
marked by complex, coordinated and accurately timed activities" 
(1943, p. 247). Acting on this hypothesis, Bartlett initiated a 
series of studies aimed at analyzing disorganization of the skilled 
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activities involved in piloting a plane (the Cambridge Cockpit 
studies: Craik 1940; Drew 1942; Bartlett 1943; 1947; and Davis, 
1946a, 1946b, 1947, 1948). 


An important feature of the set-up in the Cockpit was the 
arrangement of the pointers and light sources on the pangl board. 
The signals were arranged in three chief groups: in the middle, 
a group important throughout the whole period of the test; at one 
side, a group important only at certain stages, especially at the 
beginning and the end; to the other side, a group intermittently 
important indicating occurrences calling for prompt, but only 
occasional action. Above and below were stimuli which could be 
brought in at the experimenter’s will, calling for a specific 
response, less specifically bound up with the central task. 


Subjects for the test were trained pilots, the numbers in the 
different studies ranging from 34 to 355. They were given written 
instructions for four maneuvers, together occupying ten minutes, 
to be repeated between intervals of straight, level flying, the 
whole test usually taking two hours, with one test lasting four 
hours. In the course of a 2-hour run there was a progressive 
deterioration which amounted to 92 per cent in speed control, an 
increase of as much as 400 per cent in the proportion of ‘large 
errors’ over ‘small ones' in side~slip. There was a decrease in 
accuracy of timing, the estimates being, at times, as much as 200 
per cent in error. Deviations in the side-slip pointer increased 
from two or three to ten degrees, and finally swung from side to 
side over a wide range before anything was done. These measurements 
have less than one chance in a hundred of being accidental. 


A limitation of tests as complicated as those made with the 
Cambridge Cockpit is the difficulty of quantification of results. 
Results which appear to have considerable significance, however, 
are reported in qualitative terms. As the test proceeded, the 
panel, which at first was perceived as an integrated whole, to 
quote Bartlett (1943, p. 252) "split up, so that it became twenty 
or so separate recording instruments. And the controlling move— 
ments split up also, so that when any one was made it was not 
pictured in a pattern of machine control, but only as the correction 
of a particular instrument reading". There tended to be a regular 
sequence in deterioration of response to the display panel. The 
splitting of the stimulus field proceeded regularly from margin to 
center. "The merely occasional stimuli were the first to break 
away from the rest. There was a phase during which they were met 
by delayed, and often hurried, response. At length they were very 
frequently indeed ignored, to use psychological language they were 
‘forgotten’, and there was a definite and, as it might be called, 
*stupid® lapse of attention." For example, "with increasing fatigue, 
over and over the petrol signal was ignored, until the machine 
stopped and the experiment reached a temporary inglorious end" 
(pp. 252-253). 


USAF-TR-5830 152 


With many subjects there was demonstrated a progressive 
tendency toward a lowering of the standard of performance without 
his awareness of the fact. More difficult instructions being 
introduced, some subjects improved quickly and the improvement 
was maintained for a long time. There were others, however, who 
improved but temporarily, soon slipping back to a lower level 
than before. 


Pilots complained of 'stickiness of attention', i.e., of 
preoccupation with one particular maneuver, other things being 
unduly neglected. For example, while gaining height accurately 
according to instructions the niachine might be allowed to drift 
off-course. When the climb was completed, the deviation off- 
course would then be rapidly, often violently, corrected. This 
erratic tendency could be noticed and picked out from the graphic 
records. This effect is closely allied to, if not a part of, the 
more inclusive factor of poor timing. In fact, increasing 
inaccuracy of the timing response is one of the outstanding 
features of deterioration in performance. Inasmuch as the timing 
of the different elements in a complicated response is subject to 
objective measurement, it may provide a promising lead to follow 
in future experimentetion. 


Some pilots also made “apparently unreasonable ana stupid 
mistakes", which are described as ‘lepses’. These are exemplified 
by such performances as makings the machine climb the number of feet 
it should have dived, or completely omitting a particular maneuver. 


On the basis of verformance an attempt was made to differentiate 
normal and non-normal pilots, the latter group being further divided 
into an ‘overactive’ type (those whose total number of movements of 
the controls was high relative to duration of errors) and an ‘inert! 
type (those for whom the total number of errors is high in relation 
to the total number of movements). The former are alleged to be 
"hyperemotional’, while the latter are characterized as "bored, 
distracted, and emotionally indifferent". A relation to operational 
fatisue is suggested by the tendency for the same effects to appear 
in normal pilots as a result of long periods of performance in the 
apparatus. Further, poorer scores are reported to occur dispropor- 
tionately in a group of ‘operationally fatigued' pilots. Validity 
for the Cockpit test has, in some measure, been established by a 
demonstrated correlation between scores and accident proneness within 
a small group of pilots. 


The authors of the test acknowledge its limitations: ‘the 
relatively long time required for administretion, and limitation 
to relatively experienced pilots. The approach to problems of 
performance represented in these studies emphasizes the subject's 
generalized modes of attack on his tasks and problems rather than 
his specific skills. 
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Studies of Operational Fatigue. 


In the various attempts made by aviation psychologists to find 
among returned aviators factors of personality related to operational 
fatigue and anxiety, use was made of a variety of measures (Wickert 
1947; Bijou 1947; Lepley 1947). It was discovered that tests that 
had been developed primarily as measures of intellectual, perceptual, 
and motor factors associated with success in flight training offered 
no promise of also predicting susceptibility to emotional disturbance 
after combat. (Wickert 1947). Of special interest were the Aiming 
Stress, and Steadiness under Pressure tests, which had been definitely 
designed to measure resistance to verbally induced emotional responses. 
These, however, proved to be of no value in predicting the anxiety 
reaction after combat. 


Two types of personality assays were administered for the 
purpose of comparing the characteristics of normal returnees with 
those with anxiety reactions: (1) Instructor-Selection Tests, and 
(2) Inventories of Personal Characteristics. The Instructor-Selection 
Test, planned as an indirect method of determining certain personal 
characteristics did not differentiate between a group of 95 anxiety 
Cases and 576 controle at AAF Redistribution Station, No. 1, but it 
was found discriminatory at the 2-per cent level in contrasting 227 
anxiety cases with 300 controls, at Redistribution Station, No. 2. 


After successive item analyses a form of Personality Inventory 
(DE201C) was evolved which had an odd-even reliability of .768 
(corrected), and a validity of the order of .50 with the diagnosis 
of anxiety reaction (Wickert 1947). The items of this inventory 
confirm the picture of anxiety reaction as one of nervousness, poor 
sleep, disturbing dreams, fatigue, and somatic evidences of emotion. 
Significant items suggest the role of combat strain as an antecedent 
of anxiety reaction, or at least as a precipitating factor. Fear, 
loss of weight, and loss of zest for flying are highly significant 
features. 


A Sociological Questionnaire indicated that length of service 
overseas discriminated between the mild anxiety reaction patients 
and the more severe groups at the l-per cent level of confidence. 
The more severe groups had spent 2.75 more months overseas than had 
the mild group. The patients discharged to civilian life because 
of the severity of their condition had averaged 3.5 months longer 
overgeas than had those patients returned to duty. This difference 
was also significant at the l-per cent level of confidence. Pre- 
sumptively traumatic factors (e.g., crashes, death of companions, 
parachute jumps) were adjudged to be so subjectively conditioned as 
to be impossible of definition in such fashion as to give them 
uniform meaning in a questionnaire or check list. 
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An analysis of the attitudes of combat flyers who were returned 
for reassignment in this country during 1944 and 1945, with reference 
to a second tour of aerial combat duty, however, showed that there 
was no consistent relationship toward a second combat tour and the 
statistics of the first tour expressed as number of combat hours, 
number of missions, or number of months overseas. "This leads to 
the most significant conclusion of the study when compared with the 
data on fear. How the man reacts psychologically to combat deter- 
mines his attitude. The numerical count of hours, missions, and the 
like, does not" (Wickert, 1947, pp. 171-172). Both officers and 
men who had spent 12 or more months overseas were slightly more 
favorable toward return overseas than those who had been in the 
theater for a shorter time, 


The questionnaire employed by Grinker et al (1946) was found 
to be an unsatisfactory device for differentiating air crew personnel 
returned to this country with ‘operational fatigue’ from control 
groups. One study was made of 284 officers, another of 198 enlisted 
personnel. The questionnaire contained 121 items, dealing with the 
patient's precombat personality and with his behavior during combat. 
The conclusion drawn from this investigation is that more drastic 
screening on the basis of predisposition to combat fatigue would be 
bought only at the cost of a higher rejection of candidates who are 
fit. Moreover, *predisposition’ is only one Cause of war neurosis. 
Among other factors are the type of stress to which the individual 
is exposed and the combat unit's morale. 


Most of the studies in which projective techniques have been 
employed with subjects exposed to vigil or operational fatigue have 
given negative results. As mentioned earlier, Tyler (1947) found 
no alternations in the Rorschach responses after as much as 112 
hours of sleep deprivation. In a study made at an AAF Convalescent 
Hospital (Bijou 1947), the records made by the patients were very 
Similar to those made by the aviation student group used as a 
control sample, the records having been made while they were in the 
process of qualifying for aircrew training. Harrower and Grinker 
(1946) devised a "Stress Tolerance Test", which was administered 
at another of the AAF hospitals. A set of pictures of war scenes 
was sandwiched between a sample of ink-blots and TAT pictures. 

It is reported that patients suffering from operational fatigue 
were so greatly ‘disturbed’ by the war scenes that they made a 
significant percentage of failures to respond at all, or they 
*personalized' their responses to the second set of projective 
stimuli. Results obtained with the Bender Visual Motor Gestalt 
Test (which requires the subject to copy geometric designs) were 
inconclusive (Bijou 1947). An Incomplete Sentence Test (DESO%, 
Bijou 1947), requiring the subject to complete a sentence which 
has been started, gave some indication of validity as a means of 
differentiating groups of patients judged by psychiatrists to be 
fit for duty from those judged to have moderate or severe adjustment 
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problems as a result of their war experiences. The reliability 
of the test by one method was "above 0.85", by another method, 
0.68. 


A very important qualitative factor, according to the testimony 
of Flight Surgeons, contributing to operational fatigue consists of 
such events as disrupt the esprit de corps of the crew and the squadron. 
The disorganization of the skill of an individual member of a crew is 
apparently as apt to result from a catastrophy experienced by some 
other member of the crew as from the physical conditions encountered 
during flight. This fact has been reported for all branches of the 
armed services (Wright 1945, 1946; Bartemeier et al 1946). Wrizht 
Claims that the most universal motivating forces are those generated 
by the relationship of the man to his group. This conclusion is 
supported by the observation that when an Air Force is small, replace— 
ments few, losses and opposition severe, men have continued to fly 
well. When, on the other hand, tha Air Force has grown large, is 
packed with replacements, and is suffering only occasional losses 
ang opposition, men have much more frequently ‘broken down'. A 
Similar position is expressed by Bartemeier, Kubie, Menninger, 

Romano and Whitehorn (1946): ‘When one considers the total pattern 
of dsfenses which are utilized by the soldier in combat, it appears 
thet the most significant factor is the soldier's position in the 
constellation of his social group, the combat team..... Further 
convirmation of the importance of these group bonds appears in the 
nature of the precipitating factors in the ‘break’. The common 
denominator of the events experienced by the soldiers and related by 
them as precipitating factors were less frequently the ‘last straw' 
in a quantitative sense than some event which necessitated a sudden 
change in the basic structure of the pattern of the soldier's group 
relationship....The soldier iost his group relationship and in losing 
it forfeited all the strengths and comforts with which it had 
sustained him. As a member of the team he would have been able to 
carry on; alone, he was overwhelmed and became disorganized". 


SUMMARY AND INTERPRETATION OF THE STUDIES OF FATIGUE, 
APPREHENSION AND STRESS 


Perusal of the studies of tests taken after loss of sleep, or 
long periods of work, indicates that any possible ‘fatigue’ that 
may have resulted is not reagily detected by short, discrete tests. 
In cases such as illustrated by ten hours or more of truck-driving, 
involving continued postural strain in the trunk ana arms, however, 
subsequent testing reveals a decrement in such tasks as involve 
postural and manual control (body sway, tapping time, manual steadiness 
and coordination). In situations of this sort one is probably 
measuring some of the functions that have been impaired by the day's 
work. But even with tests that are continued for hours there may not 
be significant decrement. This might be expected to be true of tasks 
that are readily automatized, like tapping, but reference to 
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automatization will not explain the case of reading the works of 
Adam Smith (Carmichael and Dearborn 1947). 


Several experimenters suggest that in tasks requiring constant 
attention the 'fatigue’ and deterioration in performance are due 
to tension, anticipatory or concurrent (Mackworth 1948a, Davis 1946, 
1948). Bartley and Chute have concluded that fatigue, "rather than 
being looked upon as some sort of physiological impairment, should 
be regarded as the pattern arising in a conflict situation" (1945, 
p. 169). Eye strain is a familiar example of tension commonly 
recognized as fatiguing. Bartley and Chute describe a situation in 
which conflict is set up involving the pupillary light reflex. "Slow 
changes in level of illumination elicit changes in the size of th 
pupillary aperture by means of the reciprocal action of the opposing 
muscles. One set of muscles upon contracting dilates the aperture, 
and the contraction of the other set constricts it. If alternations 
from light to dark are rapid enough, both sets of muscles may come 
to contract simultaneously and a clear cut and simple example of 
incompatibility occurs" (1947, p. 218). Bartley cites another experi- 
ment in which conflict is set up between voluntary fixation and the 
reflex tendency to respond to a specific light source (Bartley 1942). 
Lights are set up 30 degrees to either side of the line of regard and 
the subject is instructed to focus an intermediate band, thue introduc- 
ing incompatibility between the normal reciprocal innervation of the 
extrinsic muscles, involved in the reflex process of focusing one or 
the other of the lights, on the one hand, and contractions required to 
focus the intermediate band, on the other. The ensuing tension is 
readily appreciated. 


It is presumably correct to say that when a person is adequately 
motivated he is free from debilitating tension, frustration, or conflict. 
On the other hand, one does not ‘do his best’ except when under the 
emotional reinforcement of the ‘right sort’ of tension. The problem is 
to discriminate between facilitating and decremental tensions. Davis 
(1946a, 1946b, 1948) attributes the deterioration that occurred in the 
tests with the Cockpit to ‘anticipatory tension’. 


Of the objective evidence furnished by the Cambridge studies a 
factor which merits special emphasis is the disruption in the timing 
of the performance. As Bartlett says, timing is of little significance 
in relation to single reactions, but when a single reaction must be 
fitted into a complex pattern it becomes of paramount importance. 

With mounting tension it is the timing that goes wrong rather than the 
efficiency of the local reactions. It is then that the rhythm of 
sequence in the complex activity is lost; the performance becomes 
irregular, "a story of spurts and delays" (Bartlett, 1943, p. 255). 
This conclusion as to the importance of correct timing is supported 

by an experiment with Landolt rings (broken circles with the gap 

north, south, east, or west, in relation to the observer (Markstein 
1932). The time and duration of appearance of each item in the display 
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are automatically recorded, and similarly the beginning and end of 
every reaction and of the intervening intervals. In the analysis of 
deterioration in a subject's reactions to the successive presentation 
of the rings it is observed that when the subject breaks down he is, 

in almost all cases, trying to respond, not to the circle that is 
immediately before him, but to the preceding circle, or to the second 
or third preceding circle. Bartlett reports that there is some 
evidence that much the same happens in tracking a rapidly moving 
target, and in dealing with radar displays in extremely rapid sequence 
(Bartlett 1947). In these instances it would appear that the receptor 
and effector series have got out of step. He cites the case of reading, 
in which the eye interprets far ahead of the voice, and those skills in 
which the posture has to be reset. "For then it is the accessory 
movements, of which normally the operator remains totally unconscious, 
which markedly and obviously go wrong. The footwork, which may settle 
the whole body balance, lags and sets everything else out of time." 
(Bartlett 1948, p. 36) 


Analysis of the disruption of correct timing is directly related 
to the splitting of the stimulus field which was found to occur in the 
Cockpit studies. As reported earlier, as the task continued the stimula- 
ting panel, which was, at first perceived as a pattern, split up into 
twenty or so separate instruments. There was a corresponding splitting 
up of the controlling movements. As a rule the splitting occurred 
regularly from margin to center. The occasional stimuli were met with 
delayed, and often hurried, responses; at length they tended to be 
completely ignored. 


Another observation to be drawn from the Cockpit studies, more 
difficult to quantify, is the phenomenon of “lowered standard”. As 
Bartlett says: "....within the skill there are always two (discrimina- 
tion) thresholds- one a measure of what the observer can do, and the 
other of what is treated as worth doing. These can, and constantly do, 
vary quite independently. At the beginning of exercise they normally 
approximate to the same value though they are never quite identical. 
With continued exercise, or under a variety of other conditions, they 
diverge more and more. The threshold of discrimination - what the 
operator can do - is little affected, except in extreme cases; the 
threshold of indifference - what is treated as worth doing - may rise 
to double, treble, or quadruple its original value.... The operator 
may know nothing about it. He may assert that his skill is exactly 
as it was, and if he is stopped and his threshold of discrimination 
measured he may appear to be right. For a genuine measure of his 
skill he needs to have both these thresholds determined within the 
operation itself" (Bartlett 1948, p. 38). 


Two generalizations may now be made in concluding this section 
of the report. In the first place, when it comes to constructing 
tests to measure the process of deterioration in the performance of 
a skilled task one may well recognize as a basic generalization the 
configurationist’s dictum: the whole is something more than a sum 
of the parts. Measurement of decrement in isolated reactions throws 
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little light upon deterioration in a complex skill, even though, 
as Section I shows, many isolated reactions are sensitive to 
decrement. Techniques of measurement must be employed which will 
record what is happening simultaneously, as well as successively, 
in the complete sensory-motor pattern. Bartlett's emphasis upon 
the importance of accurate timing of the constituent elements 
suggests a method of approach which should prove highly fruitful. 


A second basic consideration lies in the all important area 
of motivation. It has long been recognized that the problems of 
fatigue in aviation are psychological. Grow (1936) stated that the 
true cause of fatigue, rather than being muscular exertion, is to 
be found in "instinctive and premature fear", McFarland concluded: 
"Aithough one can demonstrate that certain muscles are subject to 
loss of efficiency in ergographic studies, this sheds little light 
on the fatigue problem in aviation, in which the pilot's musculature 
is not used excessively..... The emphasis has been placed upon 
psychological factors, such as worry and mental and emotional 
conflict" (194la, pp. 4 and 12). 


One reason why inconsistent results have so often been obtained 
in the attemvts to measure the effects of fatigue is that the 
motivating conditions have not been comparable. In spite of the 
fact that job sampling has definite limitations as a method for 
studying deterioration in skill, it does have the advantage of 
providing motivation comparable to that in which the task itself 
is performed. One great difficulty of most of the tests that have 
been used to measure fatigue is that the activation has beer 
artificial. Tests that will effectively measure deterioration in 
the skills of aviation need to be administered under tensions and 
aspirations comparable to those under which those skills actually 
function professionally. The tests will need to measure not only 
what the subject is capable of doing but what he considers worth 
doing, the relationship between his level of aspiration and his 
threshold of ability, under the stress of tension-producing 
circumstances. 
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APPENDIX 
A-l. Tests of Visual Function. — 


For purposes of this report tests of visual function include 
measures of visual acuity, brightness discrimination, dark adapta- 
tion, area of visual field, area of blind spot, accommodation, depth 
perception, and of other functions involving what is commonly 
accepted as the purely ‘receptive' function of the eyes. These 
tests cannot be entirely overlooked in a study of the present sort 
since a majority of major investigations of performance under 
various deleterious conditions have included one or more tests of 
visual reception in the battery used. In so far as some of the 
tests listed may be employed as indices of psycholosical functions 
other than purely sensory, they are of interest. Also they are of 
importance in many studies as controls on the possibility that 
apparent deficit in more complex processes is actually due to 
deficiency in the visual process, per se. To subserve these purposes 
a representative list of visual tests has been appended as Table A-l, 
although, because of the diversity of tests employed, results are 
not tabulated. Investigations employing visual tests to detect 
behavioral inefficiency under anoxic conditions will be seen from 
Table A-1 to be especially numerous. 


A-2. Critical Flicker Frequency Tests. 


Among the tests of visual function those dealing with critical 
frequency of flicker fusion are singled out for special attention 
both because of the bulk of work that has been done with them, and, 
in addition, because they make apparent the logic of much of the 
research effort in the general area under discussion. The relatively 
great amount of emphasis given to the test seems to stem, in part, 
from an interest in finding an involuntary response, which is 
presumably less sensitive to learning than voluntary responses, and 
which, consequently, cannot be compensated for to any great extent 
by voluntary effort. In addition, C.F.F. tests offer the possibility 
of providing an objective yardstick which would be closely related to 
changes in the environmental variable (e.g. hours of fatiguing activity) 
and/or with subjective reports of 'fatigzue' against which other measures 
of psrcholozgical performance might be validated. 


The minimum frequency per unit time of a given illumination which 
just results in the report of fusion, rather than discrete flashing, 
by an observer, is conventionally know as the critical flicker limen. 
A large amount of work done on this phenomenon has shown that the 
flicker fusion threshold varies with a number of factors such as . 
brightness of illumination, color, size of test area, and the like. 
Several methods have been used to test this function. The newer 
techniques substitute an electronic apparatus with a neon-glow lamp 
for the older motor driven disc apparatus. Descriptions of standard 
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methods and techniques will be found in Jones et al (1941), 
Draeger and Fauley (1943), Henry (1942), and Misiak (1947). 


C.F.F. Tests have been used by several investigators under 
conditions of simulated altitude with results consistently 
showing a decrement. (Seitz 1940; Lilienthal and Fugitt 1945; 
Vollmer et al 1946; Birren et al 1946). Under fatigue and allied 
conditions the results have been less conclusive: two studies 
(Simonson and Enzer 1941; Jones et al 1941) show decrements, 
while two others yield none (Tyler 1947; Brozek and Keys 1944). 
Excessive CO, and oxygen dimunition had no effect on C.F.F. 
Findings under other conditions are summarized in the accompanying 
Table A-2.. 


The reliability of a test of the present type has been 
reported by Misiak to be high (.93, test-retest, 3rd with lOth day). 
However, the fact that the reliability was low when the scores 
obtained on the lst and 10th days were compared, suggests a fairly 
strong initial habituation, or practice effect. 


A-3. Tests of Auditory Function. 


Tests of auditory acuity, pitch discrimination, as well as of 
other functions involving the ear, have been employed in the 
measurement of performance decrement. References cited provide, 
in view of the purpose of the present survey, only a sample of 
work in this field. For the most part, little decrement in simple 
auditory function has been reported under conditions studied. 
However, it appears to be fairly well founded that intelligibility 
of speech is diminished (McFarland 1946) at altitude, and Van Liere 
(1942), in a review article containing an extensive bibliography 
of the subject, has reported some decrement in response to auditory 
stimulation, depending, apparently, on degree of anoxia and length 
of exposure. 


A-4. Tests of Other Sensory Functions. 


Reasons that determined a minimum consideration of vision 
and audition are applicable to the remaining sensory fields. 
References cited are representative of techniques @mployed to 
detect sensory deficit wnder the conditions listed. 


A-5. Measures of siological Correlates. 
The accompanying Table is based on a residium of references 
not included in the body of the paper since the problem of physio- 


logical concomitants of performance was deemed to fall outside 
of the present study. With few exceptions, results obtained 
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with these measures under the conditions indicated have failed 

to demonstrate any clear cut relation either to performance or 

to a criterion useful for selection. The list of tests is believed 
to sample the field of studies of the type under consideration, but 
may not be considered exhaustive. 


A-6,. Tests of Eye-Movement and Frequency of Blinking. 


Since ocular-motor functions have been demonstrated to be 
influenced by several environmental conditions, brief mention is 
made of the tests that have been used to measure them. Eye-movements 
are usually recorded by either of two methods: (1) by means of an 
ophthalmograph which is essentially a camera with dual lenses designed 
to record images reflected from the two corneas on a continuously 
moving strip of film; and (2) by means of electrical apparatus which 
indicates changes in the corneo-retinal potential as this is determined 
by movements of eyes in their orbits. For a description of tests of 
the present type, apparatus and techniques, the interested reader is 
referred to the work of Jones et al (1941) and that of Hoffman, 
Wellman and Carmichael (1939). 


Impaired control of saccadic eye movements has been reported 
under several of the conditions listed, including altitude. Fora 
discussion of reliability of the ophthalmographic method, generally 
reported as high, the work of Tinker (1936) is cited. McFarland 
and his collaborators (1937) call attention to the test as one 
especially sensitive to altitude, and further, as one that may be 
presumed to be uninfluenced by ‘compensation’ since the subject is 
unaware of impairment. 


Frequency of blinking movements of the eye-lid are recorded 
by the same methods described above for eye-movements. Luckiesh 
and Moss (1937) have offered evidence showing that rate of blinking 
is a positive function of the duration of a visual task, and is 
therefore an index of visual fatigue. However, Bitterman and his 
collaborators (1945, 1946, 1947) have failed to confirm the findings 
of Luckiesh and Moss. Carpenter (1948) has reopened the present 
issue by showing a marked rise in rate of blinking during the 
first to the second half hour's performance on the ‘clock test’ 
previously described. 


A-?. Strength of Grip Tests. 


The references listed in Table A-7, while they neither exhaust 
the varieties of the test, nor the applications to which they have 
been put, are believed to be representative of work in the present 
field. In the most frequently used test of strength, the hand grip 
is measured by instructing the subject to raise a dynamometer gripped 
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in the hand, to the level of the head, then to bring it down 
quickly, exerting maximal pressure. Three tests are given with 
each hand alternately, and the best three counted as the score, 
Other tests similar in principle have been designed to measure 
strength of the back and legs. For descriptions of standard 
Smedley apparatus and technique reference is made to Whipple 
(1914), Garrett and Schneck (1933), Gray and Trowbridge (1942). 
Fisher and Birren (1946) describe a modified apparatus, 
calibrated on a different scale, together with an improved way 
of administering the test. 


Results obtained under a wide variety of conditions show 
preponderantly negative results on strength of grip tests. 
Decrements are reported under the conditions of cold (Horvath 
and Freedman 1947) and heat (although the test used under the 
latter condition is more one of physical endurance than strength) 
(Mackworth 1947), and following a ‘fatigue run' (Fisher and Birren 
1946). Improved performance on a dynamometric test has been 
demonstrated with the administration of benzedrine (Thornton, 
Holck, end Smith 1959). Other conditions, including drugs, diet, 
toxic fumes, and fatigue, resulted in no impairment in strength, 
or were indeterminate. 


The reliability of a test of the present tyne is reported 
by Fisher and Birren (1946) to be high, achieving split—half 
values of .91 - .92 (corrected). Test scores are reported by 
the same investigators to be somewhat influenced by practice. 


Evidence is lecking to show that strength of grip is an 
index of any psycholosical function or, in fact, that it represents 
anything more than the strength of the bodily member tested. The 
work of Keys et al (1941, 1944, 1945) and of Brozek et al (1946) 
justifies the conclusion that such tests are remarkably resistant, 
even to extreme conditions of deficiency. Intercorrelations of 
dynamometric test results with those obtained with psychomotor 
tests have been uniformly reported to be iow or zero. 


A-8. Tests of General Intelligence. 


In spite of the widespread employment of tests of general 
intelligence in the detection of deficit in performance, there 
appear to be several reasons for devoting a minimum of considera- 
tion to them in the present treatment. Foremost, as will be seen 
from the accompanying table, the outcome of a majority of these 
efforts has been negative. Almost of equal importance is the 
fact that intelligence testing has been exclusively concerned with 
the problem of predicting success and failure in highly specific 
and complex life situations rather than with the separate task of 
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isolating basic psycholozical functions of more general inter- 

pretability. In addition to offering a coarse mesh with which 

to measure deficit, intelligence tests are probably too complex 
to shed much light on the factors being measured. 


A number of individual test items or types of items included 
within tests of intelligence which have been tried out as testz 
of decrement have been included under other headings (see 
tsrithmetic computation? end "logical relations’, ‘imiediate 
memory’, etc.) 


In Teble A-8 a representative list of general intelligence 
tests which have been used in measuring deficit is appended. For 
statements of techniques, materials, reliabilities and interpreta- 
tions of these tests the reader is referred to the source materiels. 
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BIBLIOGRAPHY 


In the compilation of the bibliography the appended lists of 
journals and general sources were searched systematically for studies 
dealing with the conditions associated with aircraft flight or with 
other conditions resulting in performance impairment, namely, altitude, 
noise, vibration, temperature, humidity, sleep privation, fatigue, 
stress, drugs or dietary modifications. After preliminary survey, 
articles dealing primarily with the following subjects were largely 
excluded: physiological, clinical and psychiatric effects; physical 
fitness; physiological or muscular work output; sensory or perceptual 
tests; selection, classification and training of service personnal; 
noise studies in communication; radar and other military performance; 
subjective observations; changes in personality, general intelligence; 
and similar studies. 


A residuum of articles, in which quantitative estimation of 
performance was attempted, remained. The bibliographies of these 
articles gave leads to others in journals not included in the search. 
In this phase of the work an attempt was made to be as comprehensive 
as possible in the time available. Because of the limited scope no 
attempt was made to survey the literature other than in Mmglish, with 
the exception of Industrielle Psychotechnik. 


After the compilation of the quantitative tests, which had been 
used to estimate decrement under deleterious conditions, was completed, 
a further series of studies was consulted for test descriptions, 
modifications, evaluation of technique and interpretation of function. 
The articles of this sort do not constitute an exhaustive list for 
any given test, nor are they necessarily the best in the field, but 
are rather the ones which have been most influential in determining 
the course of experiments in the field of decrement testing. 
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Bulletin of the Canadian Psychological 
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Canadian Journal of Psychology 
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1916-1948 
1940-1948 
1940-1948 
1930-1946 
1930-1946 
1919-1920 
1940-1948 
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CAA, Division of Research, Reports 1-77. 


Medical Research Council, Great Britain, Industrial Health Research 


Board, Report Series, 1-90. 
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General Sources 


Bibliography of Scientific and Industrial Reports. 
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